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PWB/UNIX Documentation Roadmap 
J. R. Mashey 


Bell Laboratories 
Piscataway, New Jersey 08854 


1. INTRODUCTION 


A great deal of documentation exists for PWB/UNIX. It has different formats, is contributed by many 
different people, and is modified frequently. New users are often overcome by the volume and distri- 
buted nature of the documentation. This ‘“‘roadmap’’ attempts to be a terse, up-to-date outline of cru- 
cial documents and information sources. 


Numerous people have contributed comments and information for this “‘roadmap,’’ in order to make it 
as helpful as possible for PWB/UNIX users. However, many of these comments are accurate only with regard 
to PWB/UNIX and may well be totally inapplicable to other versions of UNIX. 


1.1 Things to Do 


See a local PWB/UNIX system administrator to obtain a “‘login name’’ and get other appropriate system 
information. 


1.2 Notation Used in this Roadmap 


{N} — Section N in this ‘‘roadmap.”’ 
++ <— item required for everyone. 
+ —- item recommended for most users. 


All other items are optional and depend on specific interest (a list of relevant documents appears in the 
Table of Contents of Documents for the PWBIUNIX Time-Sharing System). 


Items in Section N of the Pwayunix User’s Manual are referred to by name(N). 
1.3 Prerequisite Structure of Following Sections 


{2} 
| 
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2. BASIC INFORMATION 


Don’t do anything else until you have learned most of this section. You must know how to log onto 
the system, make your terminal work correctly, enter and edit files, and perform basic operations on 
directories and files. 


2.1 PWB/UNIX User's Manual ++ 


e Read /ntroduction and How to Get Started. 
e Look through Section I to become familiar with command names. 
e Note the existence of the Table of Contents and of the Permuted Index. 


Section I will be especially needed for reference use. 
2.2 UNIX for Beginners ++ 
2.3 A Tutorial Introduction to the UNIX Text Editor ++ 


2.4 Advanced Editing on UNIX + 

2.5 PWB Papers from the Second International Conference on Software Engineering + 
Gives an overview of the Programmer’s Workbench. 

2.6 Things te Do 


Do all the exercises found in {2.2} and {2.3}, and maybe (2.4}. 
Create a file named ‘‘.mail’”’ in your login directory,* so that other people (such as system adminis- 
trators) can send you mail. This can be done by: 


cp /dev/nulil mail 


If you want some sequence of commands to be executed each time you log in, create a file named 
‘* profile’ in your login directory containing the commands you want executed. For more informa- 
tion, see Initialization in sh(1). 

Files in directory ‘‘/usr/news’’ contain recent information on various topics. To see what has been 
updated recently, type: 


ls —It /usr/news 
and then print any files that look interesting. Other useful actions include: 


mail —f /usr/news/.mail gives recent history from primary system mailbox. 

cat /usr/news/helpers gives contacts and telephone numbers for counseling, file 
restorals, trouble reporting, and other services. 

nroff —mm /usr/news/roadmap prints current copy of this ‘‘roadmap.” 

cat /usr/news/terminals gives recommendations on selection of computer terminals. 


2.7 Manual Pages to Be Studied 


The following commands are described in Section I of the PwayuNIX User’s Manual, and are used for 
creating, editing, moving (i.e., renaming), and removing files: 


cat(I) concatenate and print files (no pagination). 

chdir(I) change working (current) directory; a.k.a. cd(I). 

ep(I) make a copy of an existing file. 

ed(I) text editor. 

Iis() list a directory; file names beginning with ‘‘.”’ are not listed unless the ‘‘—a’’ 
flag is used. 

mkdir (I) make a (new) directory. 

mv(I) move (rename) file. 

pr(I) print files (paginated listings). 

rm (I) remove (delete) file(s). 

rmdir(1) remove directory (ies). 


The following help you communicate with other users, make proper use of different kinds of terminals, 
and print manual pages on-line: 


login (I) sign on. 

mail (I) send mail to other users; inspect mail from them, or contents of the system 
mailbox. 

man (I) print pages of PwB/UNIX User’s Manual. 

stty (1 set terminal options; i.e., inform the system about the hardware characteristics 
of your terminal. | 

tabs(I) set tab stops on your terminal. 

terminals(VII) gives descriptions of commonly-used terminalis. 

who(I) print list of users currently logged in. 

write{I) communicate with another (logged in) user. 


The directory you are in right after logging into the system. 
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3. BASIC TEXT PROCESSING AND DOCUMENT PREPARATION 


You should read this section if you want to use existing text processing tools to write letters, 
memoranda, manuals, etc. 


3.1 PWB/MM—Programmer’s Workbench Memorandum Macros ++ 


This is a reference manual, and can be moderately heavy going for a beginner. Try out some of the 
examples, and stick close to the default options. 


3.2 Typing Documents with PWB/MM ++ 
A handy fold-out. 
3.3 NROFF/TROFF User’s Manual + 


Describes the text formatting language in great detail; look at the REQUEST SUMMARY, but don’t try to 
digest the whole manual on first reading. 


3.4 Documentation Tools and Techniques + 


This overview of UNIX text processing methods is one of the papers from the Second International 
Conference on Software Engineering. (See {2.5} above). 


3.5 Manual Pages to Be Studied 


mm (I) makes it easy to specify standard options to nroff(). 
nroff(I) read to see formatter option flags. 

spell (I) identifies possible spelling errors. 

tmac.name({VII) list of text-formatting macro packages. 

typo(I) _ identifies possible typographical errors. 


To obtain some special functions (e.g., reverse paper motion, subscripts, superscripts), you must either 
indicate the terminal type to nroff or post-process nroff output through one of the following: 


450(D newer Diablo printer terminals, such as the DASI450, DIABLO 1620, XEROX 1700, 
etc. 

col(I) terminals lacking physical reverse motion, such as the Texas Instrument 700 
series. 

gsi(I) older Diablo printer terminals, such as the Gs1300, Das1300, DTC300, etc. 

hp) Hewlett-Packard 2640 terminals (HP2640A, HP2640B, HP2644A, HP2645A, etc.). 


4. SPECIALIZED TEXT PROCESSING 

The tools listed in this section are of a more specialized nature than those in {3}. 
4.1 TBL—A Program to Format Tables + 

Great help in formatting tabular data (see rd/(1)). 

4.2 Typesetting Mathematics— User’s Guide (2nd Edition) + 


Read this if you need to produce mathematical equations. It describes the use of the equation setting 
commands egn(I) and neqn(I). 


4.3 A TROFF Tutorial 
An introduction to formatting text with the phototypesetter. 
4.4 Manual Pages to Be Studied 


diffmark(D marks changes between versions of a file, using output of dif(I) to produce 
‘‘revision bars’’ in the right margin. 

eqn(I) preprocessor for mathematical equations (phototypesetter). 

neqn(I) preprocessor for mathematical equations (terminals). 

tbl (1) preprocessor for tabular data. 


troff (1) formatter for phototypesetter. 
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5. ADVANCED TEXT PROCESSING 


You should read this section if you need to design your own package of formatting macros or perform 
other actions beyond the capabilities of existing tools; {3} is a prerequisite, and familiarity with {4} is 
very helpful, as is an experienced advisor. It takes a great deal of effort to write a good package of 
macros for general use. Don’t reinvent what you can borrow from an existing package (such as 
PWB/MM). 


5.1 NROFF/TROFYF User’s Manual ++ 


Look at this in detail and try modifying the examples. If you are going to use the phototypesetter, do 
the same for A TROFF Tutorial ({4.3} above). 


5.2 Things to Do 


It is fairly easy to use the text formatters for simple purposes. A typical application is that of writing 
simple macros that print standard headings in order to eliminate repetitive keying of such headings. It 
is extremely difficult to set up general-purpose macro packages for use by large numbers of people. If 
possible, try to use an existing package or modify one as needed. Look at existing packages first—see 
tmac. name( VII). 


5.3 Manual Pages to Be Studied 
All pages mentioned in {3} and {4}. 


6. COMMAND LANGUAGE (SHELL) PROGRAMMING 


The Shell provides a powerful programming language for combining existing commands. This section 
shouid be especially useful to those who want to automate manual procedures and build data bases. 


6.1 The UNIX Time-Sharing System ++ 
6.2 PWB/UNIX Sheil Tutorial + + 
6.3 Things to Do 


If you want to create your own library of commands, create a ‘‘.path’’ file in your login directory, as 
described in sA(D). 


6.4 Manual Pages to Be Studied 


Read sA(I) first; the following pages give further details on commands that are most frequently used 
within command language programs: 


echo(I) echo arguments (typically to terminal). 
equais(I) Shell assignment command (for variables). 
exit(D terminate command file. 

expr(I) evaluate an algebraic expression. 

fd2(1) redirect diagnostic output. 

if (1) conditional command. 

next(I) read command input from named file. 
nohup(I) run a command immune to communications line hang-up. 
onintr({I) handle interrupts in Shell files. 

pump (I) Shell data transfer command. 

sh(1) Shell (command interpreter). 

shift(D adjust Shell arguments. 

switch (I) Shell multi-way branch command. 


while(I) 


Sheil iteration command. 


7, FILE MANIPULATION 


In addition to the basic commands of {2}, many UNIX commands exist to perform various kinds of file 
manipulation. Small data bases can often be managed quite simply, by combining text processing (from 
{5}), command language programming {6}, and commands listed below in {7.2}. 


7.1 Things to Do 


This ‘‘roadmap’’ notes only the most frequently used commands. It is wise to scan Section I of the 
PwaluNnix User’s Manual periodically—you will often discover new uses for commands. 


7.2 Manual Pages to Be Studied 
The following are used to search or edit files in a single pass: 


grep(I) search a file for a pattern; more powerful and specialized versions include 
egrep(I), ferep(1), and rgrep(1). 

sed (I) stream editor. 

tr{I) transliterate (substitute or delete specified characters). 


The following compare files in different ways: 


emp(I) - compare files (byte by byte). 
comm(I) print lines common to two files, or lines that appear in only one of the two files. 
diff (D differential file comparator (minimal editing for conversion). 


The following combine files and/or split them apart: 


ar(I) archiver and library maintainer. 
cpio(]) general file copying and archiving. 
esplit(D split file by context. 
split(D split file into chunks of specified size. 
These commands interrogate files and print information about them: 
file (1) determine file type (best guess). 
od(D octal dump (and other kinds also). 
wce(I) word (and line) count. 


Miscellaneous commands: 


find(1) search directory structure for specified kinds of files. 
gath(I) gather real and virtual files; alias for send(I). 

help(I) ask for help about a specific error message. 

reform()) reformat ‘‘tabbed’’ files (often used to truncate lines). 
sort(I) sort or merge files. 

tee(I) copy single input to several output files. 

uniq(I) report repeated lines in a file, or obtain unique ones. 


8. C PROGRAMMING 

Try to use existing tools first, before writing C programs at all. 
8.1 Programming in C—A Tutorial ++ 

Read; try the examples. 

8.2 C Reference Manual ++ 

Terse but complete reference manual. 

8.3 A New Input-Output Package + 


Describes a new [/O package that is superseding many of the existing routines; write any new code 
using this package. 


8.4 UNIX Programming + 

8.5 YACC—Yet Another Compiler Compiler 

8.6 LEX—Lexical Analyzer Generator 

8.7 Make—A Program for Maintaining Computer Programs 
8.8 Things to Do 


The best way to learn C is to look at the source code of existing programs, especially ones whose func- 
tions are well known to you. Much code can be found in directory ‘‘/sys/source’’. In particular, direc- 
tories ‘‘s1’’ and ‘‘s2”’ contain the source for most of the commands. Also, investigate directory 
‘*/ustr/include’’. 


8.9 Manual Pages to Be Studied 


adb(I) C debugger; more powerful (but more complex) than the older cdb{1). 
cc(I) C compiler. 

cdb(I) C debugger (for post-mortem core dumps and other debugging). 
id(I) loader (you must know about some of its flags). 

lex(D generate lexical analyzers. 

make (I) automate program regeneration procedures. 

nm (1) print name (i.e., symbol) list. 

prof(D display profile data (used for program optimization). 

regemp({I) compile regular expression. 

strip(D remove symbols and relocation bits from executable file. 

time(D) time a command. 

yacc(I) parser generator. 


9, IBM REMOTE JOB ENTRY (RJE) 

This section is for those who use PWB/UNIX to submit jobs to remote computers. 
9.1 Guide to IBM Remote Job Entry for PWB/UNIX Users + 

9.2 Manual Pages to Be Studied 


bfs(D big file scanner (scans RJE output). 

esplit (1) split file by context (often used to split RJE output). 

fspec(V) format specification in text files. 

reform({I) reformat files (often used to convert source programs from non-UNIX systems). 
rjestat (1) RJE status and enquiries. 

send (I) submit RJE job. 


10. SOURCE CODE CONTROL SYSTEM (SCCS) 


ScCcs can be used to maintain, control, and identify files of text as they are modified and updated. Its 
most common use is for maintaining source programs, as well as for keeping track of successive ver- 
sions of various documents; in combination with diffmark(I), this allows one to automatically generate 
‘*revision bars’’ in successive editions of such documents. 


10.1 SCCS/PWB User’s Manual ++ 
10.2 Manual Pages to Be Studied 
Of the following, ger(I), defta(I), and prt(I) are most frequently used. 


admin(I) administer sccs files (including creation thereof). 

chghist (1) change the history entry of an Sccs delta. 

comb(I) combine Sccs deltas. 

delta(1) make an sccs delta (a permanent record of editing changes). 
get(I) get a version of an sccs file. 


prt(I) print SCCS file. 


ye 


rmde!(I) remove a delta from an SCcs file. 
scecsdiff (1) get the differences between two SCCS deltas. 
what (I) find and print sccs identifications in files. 


11. NUMERICAL COMPUTATION 

11.1 DC—An Interactive Desk Calculator 

11.2 BC—~An Arbitrary Precision Desk Calculator Language 
11.3 RATFOR—A Preprocessor for a Rational Fortran 

11.4 Manual Pages to Be Studied 


bas (I) BASIC interpreter. 

be (I) interactive language, acts as front end for dc(I) 
de(1) desk calculator. 

fe(I) Fortran compiler/interpreter. 


re(I) RATFOR preprocessor. 


The Pws/UNIX* document entitled: 


Pweaiunix Beginner’s Course 


is not yet available. 


UNIX is a Trademark/Service Mark of the Beil System. 
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A Tutorial Introduction to the UNIX Text Editor 


B. W. Kernighan 


Bell Laboratories, Murray Hill, N. J. 


Introduction 


Ed is a “text editor”, that is, an interactive 
program for creating and modifying “text”, using 
directions provided by a user at a terminal. The 
text is often a document like this one, or a pro- 
gram or perhaps data for a program. 


This introduction is meant to simplify 
learning ed The recommended way to learn ed 
is to read this document, simultaneously using ed 
to follow the examples, then to read the descrip- 
tion in section I of the UNIX manual, all the 
while experimenting with ed. (Solicitation of ad- 
vice from experienced users is also useful.) 


Do the exercises! They cover material not 
completely discussed in the actual text. An ap- 
pendix summarizes the commands. 


Disclaimer 


This is an introduction and a tutorial. For 
this reason, no attempt is made to cover more 
than a part of the facilities that ed offers 
(although this fraction includes the most useful 
and frequently used parts). Also, there is not 
enough space to explain basic UNIX procedures. 
We will assume that you know how to log on to 
UNIX, and that you have at least a vague under- 
Standing of what a file is. 


You must also know what character to type 
as the end-of-line on your particular terminal. 
This is a “newline” on Model 37 Teletypes, and 
“return” on most others. Throughout, we will 
refer to this character, whatever it is, as “new- 
line’’. 


Getting Started 


We'll assume that you have logged in to 
UNIX and it has just said ‘“%”. The easiest way 
to get ed is to type 


ed (followed by a newline) 


You are now ready to go — ed is waiting for you 
to teil it what to do. 


Creating Text — the Append command ‘‘a’’ 


As our first problem, suppose we want to 
Create some text starting from scratch. Perhaps 
we are typing the very first draft of a paper; 
clearly it will have to start somewhere, and un- 
dergo modifications later. This section will show 
how to get some text in, just to get started. 
Later we’ll talk about how to change it. 


When ed is first started, it is rather like 
working with a blank piece of paper — there is 
no text or information present. This must be 
supplied by the person using ed; it is usually 
done by typing in the text, or by reading it into 
ed from a file. We will start by typing in some 
text, and return shortly to how to read files. 


First a bit of terminology. In ed jargon, the. 


text being worked on is said to be “kept in a 
buffer.” Think of the buffer as a work space, if 
you like, or simply as the information that you 
are going to be editing. In effect the buffer is 
like the piece of paper, on which we will write 
things, then change some of them, and finaily 
file the whole thing away for another day. 


The user tells ed what to do to his text by 
typing instructions called “commands.” Most 
commands consist of a single letter, which must 
be typed in lower case. Each command is typed 
on a separate line. (Sometimes the command is 
preceded by information about wnat line or lines 
of text are to be affected — we will discuss these 
shortly.) Ed makes no response to most com- 
mands — there is no prompting or typing of mes- 
sages like “ready”. (This silence is preferred by 
experienced users, but sometimes a hangup for 
beginners.) 


The first command is append, written as the 
letter 
a 


all by itself. It means “append (or add) text 
lines to the buffer, as [ type them in.” Append- 
ing is rather like writing fresh material on a 
piece of paper. 


B.3 


So to enter lines of text into the buffer, we 
just type an “ta” followed by a newline, followed 
by the lines of text we want, like this: 


a 
\ Now is the time 

for all good men 

to come to the aid of their party. 


The only way to stop appending is to type a 
line that contains only a period. The “.” is used 
to tell ed that we have finished appending. 
(Even experienced users forget that terminating 
“*.” sometimes. If ed seers to be ignoring you, 
type an extra line with just “.” on it. You may 
then find you’ve added some garbage lines to 
your text, which you’ll have to take out later.) 


After the append command has been done, 
the buffer will contain the three lines 


Now is the time 
for all good men 
to come to the aid of their party. 


The “a” and “.” aren’t there, because they are 
not text. 


To add more text to what we already have, 
just issue another “a” command, and continue 
typing. 


Error Messages — ‘‘?** 

If at any time you make an error in the 
commands you type to ed, it will tell you by typ- 
ing 

, 


This is about as cryptic as it can be, but with 
practice, you can usually figure out how you 
goofed,. 


Writing text out as a file — the Write command 
§ boyy 99 

It’s likely that we'll want to save our text 
for later use. To write out the contents of the 
buffer onto a file, we use the write command 


Ww 


followed by the filename we want to write on. 
This will copy the buffer’s contents onto the 
specified file (destroying any previous informa- 
tion on the file). To save the text on a file 
named “junk”, for example, type 
w junk 

Leave a space between “w” and the file name. 
Ed will respond by printing the number of char- 


acters it wrote out. In our case, ed would 
respond with 


68 


(Remember that blanks and the newline charac- 
ter at the end of each line are included in the 
character count.) Writing a file just makes a copy 
of the text ~ the buffer’s contents are not dis- 
turbed, so we can go on adding lines to it. This 
is an important point. Ed at all times works on a 
copy of a file, not the file itself. No change in 
the contents of a file takes place until you give a 
“w” command. (Writing out the text onto a file 
from time to time as it is being created is a good 
idea, since if the system crashes or if you make 
some horrible mistake, you will lose all the text 
in the buffer but any text that was written onto a 
file is relatively safe.) 


Leaving ed — the Quit command ‘‘q’’ 

To terminate a session with ed, save the 
text you’re working on by writing it onto a file 
using the “w” command, and then type the 
command 


q 
which stands for quit. The system will respond 
with “%”, At this point your buffer vanishes, 
with all its text, which is why you want to write 
it out before quitting. 


Exercise 1: 
Enter ed and create some text using 


a 
. text... 
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Write it out using “w”. Then leave ed with the 
‘“q” command, and print the file, to see that 
everything worked. (To print a file, say 


pr filename 
or 
cat filename 
in response to “%’’. Try both.) 


Reading text from a file — the Edit command 
Se?” 

A common way to get text into the buffer 
is to read it from a file in the file system. This is 
what you do to edit text that you saved with the 
‘“w’? command in a previous session. The edit 
command ‘“e” fetches the entire contents of a 
file into the buffer. So if we had saved the three 
lines ‘““Now is the time’, etc., with a ““w’ com- 
mand in an earlier session, the ed command 


e junk 


would fetch the entire contents of the 
“junk” into the buffer, and respond 


file 


68 


which is the number of characters in “junk”. Jf 
anything was already in the buffer, it is deleted first. 


If we use the “e” command to read a file 
into the buffer, then we need not use a file name 
after a subsequent ‘“w” command; ed remembers 
the last file name used in an “e” command, and 
‘““w” will write on this file. Thus a common way 
to operate is 


ed 

e file 

[editing session) 
WwW 

q 


You can find out at any time what file 
name ed is remembering by typing the f/e com- 
mand “f’. In our case, if we typed 


f 
ed would reply 
junk 


Reading text from a file ~ the Read command 
S4p99 

Sometimes we want to read a file into the 
buffer without destroying anything that is al- 
ready there. This is done by the read command 
“rr”. The command 


r junk 
will read the file “junk” into the buffer; it adds it 


to the end of whatever is already in the buffer. 
So if we do a read after an edit: 


e junk 

r junk 
the buffer will contain two copies of the text (six 
lines). 


Now is the time 
for all good men 
to come to the aid of their party. 
Now is the time 
for all good men 
to come to the aid of their party. 


Like the “w” and “e” commands, “‘r’” prints the 
number of characters read in, after the reading 
operation is complete. 

Generally speaking, ‘“‘r” is rauch less used 
than “e”’. 


Exercise 2: 

Experiment with the ‘“‘e’’ command — try 
reading and printing various files. You may get 
an error ‘“?”’’, typically because you spelled the 
file name wrong. Try alternately reading and ap- 
pending to see that they work similarly. Verify 
that 


ed filename 
is exactly equivalent to 


ed 
e filename 


What does 
f filename 
do? 


Printing the contents of the buffer — the Print 
command ‘‘p’’ 

To print or list the contents of the buffer 
(or parts of it) on the terminal, we use the print 
command 


Pp 
The way this is done is as follows. We specify 
the lines where we want printing to begin and 
where we want it to end, separated by a comma, 
and followed by the letter “p”. Thus to print the 
first two lines of the buffer, for example, (that is, 
lines | through 2) we say 


1,2p (starting line=1, ending line=2 p) 
Ed will respond with 


Now is the time 
for all good men 


Suppose we want to print aif the lines in 
the buffer. We could use “1,3p” as above if we 
knew there were exactly 3 lines in the buffer. 
But in general, we don’t know how many there 
are, so what do we use for the ending line 
number? Ed provides a shorthand symbol for 
“line number of last line in buffer’ — the dollar 
sign “S$”. Use it this way: 

1,5p 
This will print aé/ the lines in the buffer (line | 
to last line.) If you want to stop the printing be- 
fore it is finished, push the DEL or Delete key; 
ed will type 

9 
and wait for the next command. 

To print the /ast line of the buffer, we 
could use 

$,5p 
but ed lets us abbreviate this to 


Sp 
We can print any single line by typing the line 
number followed by a ‘“‘p”. Thus 
Ip 
produces the response 
Now is the time 
which is the first line of the buffer. 


In fact, ed lets us abbreviate even further: 
we can print any single line by typing just the 
line number — no need to type the letter ‘‘p”. 
So if we say 


$ 

ed will print the last line of the buffer for us. 
We can also use ‘“$” in combinations like 
$~1,3p 

which prints the last two lines of the buffer. 


This helps when we want to see how far we got 
in typing. 


Exercise 3: 


As before, create some text using the ap- 
pend command and experiment with the “p” 
command. You will find, for example, that you 
can’t print line 0 or a line beyond the end of the 
buffer, and that attempts to print a buffer in re- 
verse order by saying 


3,1p 
don’t work. 


The current line — “Dot’’ or “‘.”’ 


Suppose our buffer still contains the six 
lines as above, that we have just typed 


1,3p 


and ed has printed the three lines for us. Try 
typing just 
p 
This will print 
to come to the aid of their party. 


which is the third line of the buffer. In fact it is 
the last (most recent) line that we have done 
anything with. (We just printed it!) We can re- 
peat this “p’’ command without line numbers, 
and it will continue to print line 3. 


The reason is that ed maintains a record of 
the last line that we did anything to (in this case, 
line 3, which we just printed) so that it can be 
used instead of an explicit line number. This 
most recent line is referred to by the shorthand 
symbol 


(no line numbers). 


(pronounced “dot”). 


Dot is a line number in the same way that ‘S” 
is; it means exactly “the current line”, or loose- 
ly, “the line we most recently did something to.” 
We can use it in several ways — one possibility 
is to say 
29D - 

This will print all the lines from (including) the 
current line to the end of the buffer. In our case 
these are lines 3 through 6. 


Some commands change the value of dot, 
while others do not. The print command sets 
dot to the number of the last line printed; by 
our last command, we would have “.” = “S$” = 
6. 

Dot is most useful when used in combina- 
tions like this one: 


+1 (or equivalently, .+1p) 


This means “print the next line” and gives us a 
handy way to step slowly through a buffer. We 
can also say 


~l (or .—lp ) 


which means “print the line defore the current 
line.” This enables us to go backwards if we 
wish. Another useful one is something like 


—3,.—-lp 
which prints the previous three lines. 


Don’t forget that all of these change the 
value of dot. You can find out what dot is at 
any time by typing 


€ 


Ed will respond by printing the value of dot. 


Let’s summarize some things about the ‘“p” 
command and dot. Essentially ‘“p’ can be pre- 
ceded by 0, 1, or 2 line numbers. If there is no 
line number given, it prints the “current line”, 
the line that dot refers to. If there is one line 
number given (with or without the letter “p’’), it 
prints that line (and dot is set there); and if 
there are two line numbers, it prints all the lines 
in that range (and sets dot to the last line print- 
ed.) If two line numbers are specified the first 
can’t be bigger than the second (see Exercise 2.) 

Typing a singie newline will cause printing 
of the next line — it’s equivalent to “.+l1p”. Try 
it. Try typing ‘“°” — it’s equivalent to “.-Ip”. 


Deleting lines: the ‘*d’’ command 


Suppose we want to get rid of the three ex- 
tra lines in the buffer. This is done by the delere 
command 


d 


Except that ‘“d" deletes lines instead of printing 
them, its action is similar to that of “p’. The 
lines to be deleted are specified for “d” exactly 
as they are for “p”: 


starting line, ending line d 
Thus the command 
4.$d 


deletes lines 4 through the end. There are now 
three lines left, as we can check by using 


1,3p 


And notice that ‘S$ now is line 3! Dot is set to 
the next line after the last line deleted, unless 
the last line deleted is the last line in the buffer. 
In that case, dot is set to “S$”. 


Exercise 4: 

Experiment with ‘ta’, “te”, “r°, Sw", “p”, 
and “d’’ until you are sure that you know what 
they do, and until you understand how dot, “$"’, 
and line numbers are used. 


If you are adventurous, try using line 
numbers with “a”, ‘“r’, and ‘“‘w” as well. You 
will find that “a” will append lines after the line 
number that you specify (rather than after dot): 
that ‘‘r’” reads a file in after the line number you 
specify (not necessarily at the end of the buffer); 
and that “w” will write out exactly the lines you 
specify, not necessarily the whole buffer. These 
variations are sometimes handy. For instance 
you can insert a file at the beginning of a buffer 
by saying 


Or filename 


and you can enter lines at the beginning of the 
buffer by saying 


Oa 
Re (> ¢ eee 


Notice that “.w” is very different from 


Ww 


Modifying text: the Substitute command ‘‘s’’ 


We are now ready to try one of the most 
important of all commands — the substitute 
command 


§ 


This is the command that is used to change indi- 
vidual words or letters within a line or group of 
lines. [t is what we use, for example, for correct- 
ing speiling mistakes and typing errors. 


Suppose that by a typing error, line | says 
Now is th time 


- the “e” has been !eft off “the”. 
“s’” to fix this up as follows: 


Is/th/the/ 


This says: “in line |, substitute for the charac- 


ters ‘th’ the characters ‘the’. To verify that it 


works (ed will not print the result automatically) 
we Say 


p 
and get 
Now is the time 


We can use 


which is what we wanted. Notice that dot must 
have been set to the line where the substitution 
took place, since the “p” command printed that . 
line. Dot is always set this way with the “‘s” 
command. 


The general way to use the substitute com- 
mand 1s 


starting-line, ending—line S/ change this/to this/ 


Whatever string of characters is between the 
first pair of slashes is replaced by whatever is 
between the second pair, in a// the lines between 
Starting line and ending line. Only the first oc- 
currence on each line is changed, however. If 
you want to change every occurrence, see Exer- 
cise 5. The rules for line numbers are the same 
as those for “p”, except that dot is set to the last 
line changed. (But there is a trap for the 
unwary: if no substitution took place, dot is aor 
changed. This causes an error “?” as a warn- 
ing.) 

Thus we can say 

1 ,Ss/speling/speiling/ 


and correct the first spelling mistake on each line 
in the text. (This is useful for people who are 
consistent misspeilers!) 

If no line numbers are given, the “s” com- 
mand assumes we mean “make the substitution 
on line dot”, so it changes things only on the 
current line. This leads to the very common se- 
quence 


s/something/something else/p 


which makes some correction on the current 
line, and then prints it, to make sure: it worked 
out right. If it didn’t, we can try again. (Notice 
that we put a print command on the same line as 
the substitute. With few exceptions, “p’’ can 
follow any command; no other muiti-command 
lines are legal.) ’ 

It’s also legal to say 

si...// 
which means “change the first string of charac- 
ters to nothing”, i.e, remove them. This is useful 
for deleting extra words in a line or removing 
extra letters from words. For instance, if we had 


Nowxx is the time 
we can say 

§/xx//p 
to get 

Now is the time 


Notice that “//" here means ‘no characters”, 
not a blank. There is a difference! (See below 
for another meaning of “//".) 


Exercise 5: 


Experiment with the substitute command. 
See what happens if you substitute for some 
word on a line with several occurrences of that 
word. For exampie, do this: 


a 
the other side of the coin 


s/the/on the/p 
You will get 
on the other side of the coin 


A substitute command changes only the first oc- 
currence of the first string. You can change all 
occurrences by adding a “g” (for “giobai”) to 
the “s” command, like this: 


S/.../.../ap 


Try other characters instead of slashes to delimit 
the two sets of characters in the “s” command — 
anything should work except blanks or tabs. 


(If you get funny results using any of the 
characters 


~ . $ — * \ 


read the section on “Special Characters’’.) 


-§<- 


Context searching ~ ‘‘/.../” 


With the substitute command mastered, we 
can move on to another highly important idea of 
ed ~ context searching. 


Suppose we have our original three line 
text in the buffer: 


Now is the time 
for all good men 
to come to the aid of their party. 


Suppose we want to find the line that contains 
“their” so we can change it to “the”. Now with 
only three lines in the buffer, it’s pretty easy to 
keep track of what line the word “their” is on. 
But if the buffer contained several hundred lines, 
and we’d been making changes, deleting and 
rearranging lines, and so on, we- would no longer 
really know what this line number would be. 
Context searching is simply a method of specify- 
ing the desired line, regardless of what its 
number is, by specifying sorne context on it. 

The way we say “search for a line that 
contains this particular string of characters” is to 
type 

/string of characters we want to find! - 

For example, the ed line 

/thetr/ 


is a context search which is sufficient to find the 
desired line — it will locate the next occurrence 
of the characters between slashes (“their”). It 
also sets dot to that line and prints the line for 
verification: 


to come to the aid of their party. 


“Next occurrence” means that ed starts looking 
for the string at line “.+1”, searches to the end 
of the buffer, then continues at line 1 and 
searches to line dot. (That is, the search “wraps 
around” from “S$” to |.) It scans all the lines in 
the buffer until it either finds the desired line or 
gets back to dot again. If the given string of 
characters can’t be found in any line, ed types 
the error message 
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Otherwise it prints the line it found. 


We can do both the search for the desired 
line anda substitution ail at once, like this: 


/their/s/their/the/p 
which will yieid 
to come to the aid of the party. 


There were three parts to that last command: 
context search for the desired line, make the 
substitution, print the line. 


The expression “/their/’ is a context 
search expression. In their simplest form, all 
context search expressions are like this — a 
string of characters surrounded by slashes. Con- 
text searches are interchangeable with line 
numbers, so they can be used by themselves to 
find and print a desired line, or as line numbers 
for some other command, like “s”. We used 
them both ways in the examples above. 


Suppose the buffer contains the three fami- 
liar lines 
Now is the time 


for all good men 
to come to the aid of their party. 


Then the ed line numbers 
{Now/+1 
/good/ 
/party/—1 
are ail context search expressions, and they all 


refer to the same line (line 2). To make a 
change in line 2, we could say 


/Now/+1s/good/bad/ 
or 

/good/s/good/bad/ 
or 

/party/—1s/good/bad/ 


The choice is dictated only by convenience. We 
could print all three lines by, for instance 


/Now/,/party/p 
or 
/Now/,/Now/+2p 


or by any number of similar combinations. The 
first one of these might be better if we don’t 
know how many lines are involved. (Of course, 
if there were only three lines in the buffer, we’d 
use 


1,39 
but not if there were several hundred.) 


The basic rule is: a context search expres- 
sion is the same as a line number, so it can be 
used wherever a line number is needed. 


Exercise 6: 

Experiment with context searching. Try a 
body of text with several occurrences of the 
same string of characters, and scan through it 
using the samme context search. 


Try using context searches as line numbers 


for the substitute, print and delete commands. 


(They can also be used with “‘r’”’, ‘““w’’, and ‘“‘a’”’.) 


Try context searching using “?text?” in- 
stead of “/text/’. This scans lines in the buffer 
in reverse order rather than normal. This is 
sometimes useful if you go too far while looking 
for some string of characters — it’s an easy way 
to back up. 

(If you get funny results with any of the 
characters 

~ ee SX 
read the section on “Special Characters”.) 

Ed provides a shorthand for repeating a 
context search for the same string. For example, 
the ed line number 


/string/ 


will find the next occurrence of “string”. It of- 
ten happens that this is not the desired line, so 
the search must be repeated. This can be done 
by typing merely 


// 


This shorthand stands for “the most recently 
used context search expression.” It can also be 
used as the first string of the substitute com- 
mand, as in 

/string1/s//string2/ 
which will find the next occurrence of “string!” 
and replace it by “string2’’. This can save a lot 
of typing. Similarly 

a7 
means ‘scan backwards for the same expres- 
sion.” 


Change and Insert ~ ‘‘c’’ and “‘i”’ 
This section discusses the change command 
C 
which is used to change or replace a group of 
one or more lines, and the insert command 
i 
which is used for inserting a group of one or 
more lines. 
“Change”, written as 
Cc 


is used to replace a number of lines with 
different lines, which are typed in at the termi- 
nal. For example, to change lines “.+1” through 
“$” to something else, type 


+1,3c 
... type the lines of text you want here... 


The lines you type between the “c’” command 
and the “.” will take the place of the original 


lines between start line and end line. This is 
most useful in replacing a line or several lines 
which have errors in ther. 


If only one line is specified in the ‘“‘c” com- 
mand, then just that line is replaced. (You can 
type in as many replacement lines as you like.) 
Notice the use of “.” to end the input — this 
works just like the ‘.” in the append command 
and must appear by itself on a new line. If no 
line number is given, line dot is replaced. The 
value of dot is set to the last line you typed in. 


“Insert” is similar to append — for instance 


/string/i 
... type the lines to be inserted here . . . 


will insert the given text defore the next line that 
contains “string”. The text between “i” and “.” 
is inserted before the specified line. If no line 
number is specified dot is used. Dot is set to the 
last line inserted. 


Exercise 7: 


“Change” is rather like a combination of 
delete followed by insert. Experiment to verify 
that 


start, endd 
i 
1. CX... 


is almost the sare as 


Start, end'c 
. ext... 


These are not precisely the same if line “$” gets 
deleted. Check this out. What is dot? 


Experiment with “a” and “i”, to see that 
they are similar, but not the same. You will ob- 
serve that 

line-number a 

... (2X2... 


appends after the given line, while 


line-aumber j 
. text... 


inserts before it. Observe that if no line number 
is given, “i” inserts before line dot, while “a” 


appends after line dot. 


a2 


Moving text around: the ‘‘m’’ command 

The move command “m7” is used for cut- 
ting and pasting — it lets you move a group of 
lines from one place to another in the buffer. 
Suppose we want to put the first three lines of 
the buffer at the end instead. We could do it by 
saying: 

1,3w temp 

$r temp 

1,3d 
(Do you see why?) but we can do it a lot easier 
with the “m”’ command: 


1,3m$ 
The general case is 

start line, end line m after this line 
Notice that there is a third line to be specified — 
the place where the moved stuff gets put. Of 


course the lines to be moved can be specified by 
context searches; if we had 


First paragraph 


end of first paragraph. 
Second paragraph 


end of second paragraph. 

we could reverse the two paragraphs like this: 
/Second/,/second/m/First/— 1 

Notice the ““~1” — the moved text goes a/ter the 


line mentioned. Dot gets set to the last line 
moved. 


The global commands ‘‘g’’ and ‘‘v’’ 

The global command “g”’ is used to execute 
one or more ed commands on all those lines in 
the buffer that match some specified string. For 
example 


g/peling/p 


prints all lines that contain “peling”’. More use- 
fully, 


g/peling/s//pelling/gp 


makes the substitution everywhere on the line, 
then prints each corrected line. Compare this to 


1,$s/peling/pelling/gp 
which only prints the last line substituted. 
Another subtle difference is that the ‘“‘g” com- 
mand does not give a “?” if “peling” is not 
found where the “s’” command will. 

There may be several commands (inciud- 
ing “ay s5q7? “" “ ‘ty’? but not ss a in that 
case, every line except the last must end with a 
backslash ‘‘\’: 


g/xxx/.-ls/abc/def/ 

.+2s/ghi/jkiA 

-2,.p 
makes changes in the lines before and after each 
line that contains ‘xxx’, then prints all three 
lines. 


The “v’ command is the same as “‘g”, ex- 
cept that the commands are executed on every 
line that does zof match the string following ‘“‘v”: 


vi /d 
deletes every line that does not contain a blank. 


Special Characters 


You may have noticed that things just 
don’t work right when you used some characters 
like ‘.*, “**", “$", and others in context searches 
and the substitute command. The reason is 
rather complex, although the cure is simple. Ba- 
Sically, ed treats these characters as special, with 
special meanings. For instance, in @ context 
search or the first string of the substitute command 
only, 


fx.y/ 


means “a line with an x, any character, and a y,” 
not just “a line with an x, a period, and ay.” A 
complete list of the special characters that can 
cause trouble is the following: 

~ en ee I TN 
Warning: The backslash character \ is special to 
ed. For safety’s sake, avoid it where possible. If 
you have to use one of the special characters in 
a substitute command, you can turn off its magic 
meaning temporarily by preceding it with the 
backslash. Thus 

s/\\\.\*/backslash dot star/ 
will change ““\.*” into “backslash dot star”. 

Here is a hurried synopsis of the other spe- 
cial characters. First, the circumflex “ ~ ” 
signifies the beginning of a line. Thus 

/*string/ 
finds “string” only if it is at the beginning of a 
line: it will find 

string 
but not 

the string... 


The dollar-sign “S’ is just the opposite of the 
circumflex; it means the end of a line: 


/stringS/ 


will only find an occurrence of “‘string’’ that is at 
the end of some line. This implies, of course, 


that 

/*stringS/ 
will find only a line that contains just “string”, 
and 

I .$/ 
finds a line containing exactly one character. 

The character “.”’, as we mentioned above, 
matches anything; 

/x.y/ 
matches any of 

X*+Y 

X~y 

xy 

X.Y 


This is useful in conjunction with “*”, which is 


a repetition character; ‘“‘a*” is a shorthand for 
“any number of a’s,” so “.*” matches any 


number of anythings. This is used like this: 
s/.*/stuff/ 

which changes an entire line, or 
s/.*,// 


which deletes all characters in the line up to and 
including the last comma. (Since “.*” finds the 
longest possible match, this goes up to the last 
comma.) 

“(" is used with “]” to form “character 
classes’; for example, 


/{1234567890]/ 


matches any single digit — any one of the char- 
acters inside the braces will cause a match. 

Finally, the ‘“&” is another shorthand char- 
acter - it is used only on the right-hand part of a 
substitute command where it means “whatever 
was matched on the left-hand side’. It is used 
to Save typing. Suppose the current line con- 
tained 


Now is the time 
and we wanted to put parentheses around it. We 
could just retype the line, but this is tedious. Or 
we could say 

s/°/(/ : 

s/$/)/ 
using our knowledge of “" and “3S”. But the 
easiest way uses the “&”’: 

s/.°/(&)/ 


This says “‘match the whole line, and replace it 
by itself surrounded by parens.” The “&” can 
be used several times in a line; consider using 


S/.°/&? &I/ 


to produce 
Now is the time? Now is the time!! 


We don’t have to match the whole line, of 
course: if the buffer contains 


the end of the worid 
we could type 
/world/s//& is at hand/ - 
to produce 
the end of the world is at hand 


Observe this expression carefully, for it illus- 
trates how to take advantage of ed to save typing. 
The string “/worid/’ found the desired line; the 
shorthand ‘“‘//” found the same word in the line; 
and the “&” saved us from typing it again. 

The “&” is a special character only within 
the replacement text of a substitute command, 
and has no special meaning elsewhere. We can 
turn off the special meaning of “&” by preceding 
it with a salad 


s/ampersand/A &/ 


will convert the word “ampersand” into the 
literal symbol “&” in the current line. 


Summary of Commands and Line Numbers 


The general form of ed commands is the 
command name, perhaps preceded by one or 
two line numbers, and, in the case of e, rand w, 
followed by a file name. Only one command is 
allowed per line, but a p command may follow 
any other command (except for 2, r, wand g). 


a (append) Add lines to the buffer (at line dot, 
unless a different line is specified). Appending 
continues until “.”’ is typed on a new line. Dot 
is set to the last line appended. 


c (change) Change the specified lines to the new 
text which follows. The new lines are terminat- 
ed by a “.”. If no lines are specified, replace line 
dot. Dot is set to last line changed. 


d (delete) Delete the lines specified. If none are 
specified, delete line dot. Dot is set to the first 
undeleted line, unless “S” is deleted, in whic 
case dot is set to “S$”. 


e (edit) Edit new file. Any previous contents of 
the buffer are thrown away, so issue a w before- 
hand if you want to save them. 


Sf Gile) Print remembered filename. If a name 
follows /the remembered name will be set to it. 


g (global) gl--/commands will execute the com- 
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mands on those lines that contain “---”, which 
can be any context search expression. 


i (insert) Insert lines before specified line (or dot) 
until a “.”” is typed on a new line. Dot is set to 
last line inserted. 


m (move) Move lines specified to after the line 
named after mz Dot is set to the last line moved. 


p (print) Print specified lines. If none specified, 
print line dot. A _ single line number is 
equivalent to “line-number p”. A single newline 
prints “.+1”, the next line. 


q (quit) Exit from ed. Wipes out all text in 
buffer!! 


r (read) Read a file into buffer (at end unless 
specified elsewhere.) Dot set to last line read. 


s (substitute) sistringlistring2/ will substitute the 
characters of ‘string2’ for ‘string!’ in specified 
lines. If no line is specified, make substitution in 
line dot. Dot is set to last line in which a substi- 
tution took place, which means that if no substi- 
tution took place, dot is not changed. s changes 
only the first occurrence of string] on a line; to 
change all of them, type 2 “g” after the final 
slash. 


v (exclude) v/—j/commands executes “commands” 
on those lines that do nor contain “‘---”, 


w (write) Write out buffer onto a file. Dot is not 
changed. 
.= (dot value) Print value of dot. (‘‘=’’ by itself 
prints the value of “S”.) 
! (temporary escape) 

Execute this line as a UNIX command. 
/meoeef Context search. Search for next line which 
contains this string of characters. Print it. Dot 
is set to line where string found. Search Starts at 
“+1”, wraps around from “S$” to 1, and contin- 
ues to dot, if necessary. 


Poem? Context search in reverse direction. Start 
search at “.—1”, scan to 1, wrap around to “$”. 
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ABSTRACT 


This paper is meant to help secretaries, typists and programmers to make 
effective use of the UNIX facilities for preparing and editing text. It provides 
explanations and examples of 


e special characters, line addressing and global commands in the editor ed; 


® commands for ‘‘cut and paste’? operations on files and parts of files, 
including the mv, cp, cat and rm commands, and the r, w, m and t com- 
mands of the editor; 


e editing scripts and editor-based programs like grep and sed. 


Although the treatment is aimed at non-programmers, new UNIX users 
with any background should find helpful hints on how to get their jobs done 
more easily. 


B.4 


Advanced Editing on UNIX 


Brian W. Kernighan 


Bell Laboratories 
: Murray Hill, New Jersey 07974 


1. INTRODUCTION 


Although UNIX provides remarkably 
effective tools for text editing, that by itself is no 
guarantee that everyone will automatically make 
the most effective use of them. In particular, 
people who are not computer specialists — typ- 
ists, secretaries, casual users — often use the 
system less effectively than they might. 


This document is intended as a sequel to A 
Tutorial Introduction to the UNIX Text Editor {1}, 
providing explanations and examples of how to 
edit with less effort. (You should also be fami- 
liar with the material in UNIX For Beginners {2].) 
Further information on all commands discussed 
here can be found in The UNIX Programmer’s 
Manual {3}. 


Examples are based on observations of 
users and the difficulties they encounter. Topics 
covered include special characters in searches 
and substitute commands, line addressing, the 
global commands, and line moving and copying. 
There are also brief discussions of effective use 
of related tools, like those for file manipulation, 
and those based on ed, like grep and sed. 


A word of caution. There is only one way 
to learn to use something, and that is to use it. 
Reading a description is no substitute for trying 
something. A paper like this one should give 
you ideas about what to try, but until you actu- 
ally try something, you will not learn it. 


2. SPECIAL CHARACTERS 


The editor ed is the primary interface to 
the system for many people, so it is worthwhile 
to know how to get the most out of ed for the 
least effort. 


The next few sections will discuss 
shortcuts and labor-saving devices. Not all of 
these will be instantly useful to any one person, 
of course, but a few will be, and the others 
should give you ideas to store away for future 
use. And as always, until you try these things, 
they will remain theoretical knowledge, not 
something you have confidence in. 


The List command ‘1!’ 


ed provides two commands for printing the 
contents of the lines you’re editing. Most people 
are familiar with p, in combinations like 


L,Sp 
to print all the lines you’re editing, or 
s/abc/def/p 


to change ‘abc’ to ‘def’ on the current line. Less 
familiar is the list command |! (the letter ‘/’), 
which gives slightly more information than p. In 
particular, | makes visible characters that are 
normally invisible, such as tabs and backspaces. 
If you list a line that contains some of these, | 
will print each tab as > and each backspace as 
<. This makes it much easier to correct the sort 
of typing mistake that inserts extra spaces adja- 
cent to tabs, or inserts a backspace followed by a 
space. 


The 1 command also ‘folds’ long lines for 
printing — any line that exceeds 72 characters is 
printed on multiple lines; each printed line 
except the last is terminated by a backslash \, so 
you can tell it was folded. This is useful for 
printing long lines on short terminals. 


Occasionally the 1 command will print in a 
line a string of numbers preceded by a backslash, 
such as \07 or \16. These combinations are used 
to make visible characters that normally don't 
print, like form feed or vertical tab or bell. Each 
such combination is a single character. When 
you see such characters, be wary — they may 
have surprising meanings when printed on some 
terminals. Often their presence means that your 
finger slipped while you were typing; you almost 
never want them. 


The Substitute Command ‘s’ 


Most of the next few sections will be taken 
up with a discussion of the substitute command 
s. Since this is the command for changing the 
contents of individual lines, it probably has the 
most complexity of any ed command, and the 
most potential for effective use. 


As the simplest place to begin, recail the 
meaning of a trailing g after a substitute com- 
mand. With 


s/this/that/ 
and 
s/this/that/g 


the first one replaces the first ‘this’ on the line 
with ‘that’. If there is more than one ‘this’ on 
the line, the second form with the trailing g 
changes a// of them. 


Either form of the s command can be foi- 
lowed by p or | to ‘print’ or ‘list’ (as described in 
the previous-section) the contents of the line: 


s/this/that/p 
s/this/that/1 
s/this/that/gp 
s/this/that/gi 


are all legal, and mean slightly different things. 
Make sure you know what the differences are. 


Of course, any s command can be pre- 
ceded by one or two ‘line numbers’ to specify 
that the substitution is to take place on a group 
of lines. Thus 


1,$s/mispeil/misspell/ 


changes the (first occurrence of ‘mispell’ to 


‘misspell’ on every line of the file. But 
1,$s/mispeli/misspeil/g 


changes every occurrence in every line (and this 
is more likely to be what you wanted in this par- 
ticular case). 


You should also notice that if you add a p 
or 1 to the end of any of these substitute com- 
mands, only the last line that got changed will be 
printed, not all the lines. We will talk later about 
how to print all the lines that were modified. 


The Unde Command ‘uw’ 


Occasionally you will make a substitution 
in a line, only to realize too late that it was a 
ghastly mistake. The ‘undo’ command u lets 
you ‘undo’ the last substitution: the last line that 
was substituted can be restored to its previous 
State by typing the command 


u 


The Metacharacter ‘.’ 


As you have undoubtedly noticed when 
you use ed, certain characters have unexpected 
meanings when they occur in the left side of a 
substitute command, or in a search for a particu- 


lar line. In the next several sections, we will talk 
about these special characters, which are often 
called. ‘metacharacters’. 


The first one is the period ‘.°. On the left 
side of a substitute command, or in a search with 
‘/.../°, *.’ stands for any single character. Thus 
the search 


/xey/ 


finds any line where ‘x’ and ‘y’ occur separated 
by a single character, as in 


x+y 
x~"y 
Xay 
Xo¥ 


and so on. (We will use q to stand for a space 
whenever we need to make it clear.) 


Since ‘.’ matches a single character, that 
gives you a way to deal with funny characters 
printed by 1. Suppose you have a line that, when 
printed with the | command, appears as 


.. th\O7is.... 


and you want to get rid of the \07 (which 
represents the bell character, by the way). 


The most obvious solution is to try 
s/\07// 


but this will fail. (Try it.) The brute force solu- 
tion, which most people would now take, is to 
re-type the entire line. This is guaranteed, and is 
actually quite a reasonable tactic if the line in 
question isn’t too big, but for a very long line, 
re-typing is a bore. This is where the metachar- 
acter ‘.’ comes in handy. Since ‘\07’ really 
represents a single character, if we say 


s/th.is/this/ 


the job is done. The ‘.’ matches the mysterious 
character between the ‘h’ and the ‘i’, whatever it 
is. 
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Bear in mind that since ‘. 
single character, the command 
s/ Py Be ‘ 


converts the first character on a line into a ‘,’, 
which very often is not what you intended. 


matches any 


As is true of many characters in ed, the ‘.’ 
has several meanings, depending on its context. 
This line shows ail three: 


oS/e/a/ 


The first ‘.” is a line number, the number of the 
line we are editing, which is called ‘line dot’. 
(We will discuss line dot more in Section 3.) The 


second ‘.’ is a metacharacter that matches any 
single character on that line. The third ‘.’ is the 
only one that really is an honest literal period. 
On the right side of a substitution, ‘.’ is not spe- 
cial. [If you apply this command to the line 


Now is the time. 
the result will be 
-ow is the time. 


which is probably not what you intended. 


The Backslash ‘\’ 


Since a period means ‘any character’, the 
question naturally arises of what to do when you 
really want a period. For example, how do you 
convert the line 


Now is the time. . 
into 
Now is the time? 


The backslash ‘\’ does the job. A _ backslash 
turns off any special meaning that the next char- 
acter might have; in particular, ‘\.’ converts the 
*.’ from a ‘match anything’ into a period, so you 
can use it to replace the period in 


Now is the time. 
like this: 

s/\./?/ 
The pair of characters ‘\.’ is considered by ed to 
be a single real period. 


The backslash can also be used when 
searching for lines that contain a special charac- 
ter. Suppose you are looking for a line that con- 
tains 


«PP 

The search 
/.PP/ 

isn’t adequate, for it will find a line like 
THE APPLICATION OF ... 


because the ‘.” matches the letter ‘A’. But if you 
say 


/\.PP/ 


you will find only lines that contain ‘.PP’. 


The backslash can also be used to turn off 
special meanings for characters other than ‘.’. 
For example, consider finding a line that con- 
tains a backslash. The search 


/\/ 


won't work, because the ‘\’ isn’t a literal ‘\’, but 
instead means that the second ‘/’ no longer del- 
imits the search. But by preceding a backslash 
with another one, you can search for a literal 
backslash. Thus 


NMI 


does work. Similarly, you can search for a for- 
ward slash ‘/’ with 


M1 


The backslash turns off the meaning of the 
immediately following ‘/’ so that it doesn’t ter- 
minate the /.../ construction prematurely. 


As an exercise, before reading further, 
find two substitute commands each of which will 
convert the line 


\x\.ly 


into the line 
\x\y 


Here are several solutions; verify that each 
works as advertised. 


s/\\\.// 
S/Xe0/x/ 
s/.ey/y/ 


A couple of miscellaneous notes about 
backslashes and special characters. First, you 
can use any character to delimit the pieces of an 
s command: there is nothing sacred about 
slashes. (But you must use slashes for context 
searching.) For instance, in a line that contains a 
lot of slashes already, like 


//lexec //sys.fort.go // etc... 


you could use a colon as the delimiter — to 
delete all the slashes, type 


$:/ 21g 


Second, if # and @ are your character 
erase and line kill characters, you jhave to type 
\# and \@; this is true whether you’re talking to 
ed or any other program. 


When you are adding text with a oriorc, 
backslash is not special, and you should only put 
in one backslash for each one you really want. 


The Dollar Sign ‘S$’ 


The next metacharacter, the ‘S’, stands for 
‘the end of the line’. As its most obvious use, 
suppose you have the line 


Now is the 


and. you wish to add the word ‘time’ to the end. 
Use the $ like this: 


s/$/ a time/ 
to get 
Now is the time 


Notice that a space is needed before ‘time’ in the 
substitute command, or you will get 


Now is thetime 
As another example, replace the second 


comma in this line with a period without altering 
the first. 


Now is the time, for all good men, 
The command needed is 
s/,$/./ 


The $ sign here provides context to make specific 
which comma we mean. Without it, of course, 
the s command would operate on the first 
comma to produce 


Now is the time. for all good men, 


As another example, to convert 
Now is the time. 
into 
Now is the time? 
as we did earlier, we can use 
s/.$/?/ 


‘9 
an) 


Like the ‘$° has multiple meanings 
depending on context. In the line 


$s/3/3/ 


the first ‘$’ refers to the last line of the file, the 
second refers to the end of that line, and the 
third is a literal dollar sign, to be added to that 
line. 


The Circumfiex ‘~’ 


The circumflex (or hat or caret) ‘*’ stands 
for the beginning of the line. For example, sup- 
pose you are looking for a line that begins with 
‘the’. If you simply say 


/the/ 


you will in all likelihood find several lines that 
contain ‘the’ in the middle before arriving at the 
one you want. But with 


/*the/ 


you narrow the context, and thus arrive at the 
desired one more easily. 


The other use of ‘~’ is of course to enable 
you to insert something at the beginning of a 
line: 

s/*/ a/ 


places a space at the beginning of the current 
line. 


Metacharacters can be combined. To 
search for a line that contains on/y the characters 


»PP 
you can use the command 
/\.PPS/ 


The Star ‘e’ 


Suppose you have a line that looks like 
this: 


text X y text 


where text stands for lots of text, and there are 
some indeterminate number of spaces between 
the x and the y. Suppose the job is to replace all 
the spaces between x and y by a single space. 
The line is too long to retype, and there are too 
many spaces to count. What now? 


This is where the metacharacter ‘*’ comes 
in handy. A character followed by a star stands 
for aS many consecutive occurrences of that 
character as possible. To refer to all the spaces 
at once, say 


s/Xxo *y/xay/ 


The construction ‘c+’ means ‘as many spaces as 
possible’. Thus ‘xq*y’ means ‘an x, as many 
spaces as possible, then a y’. 


The star can be used with any character, 
not just space. If the original example was 
instead 


then all ‘~—' signs can be replaced by a single 
space with the command 


s/x—*y/Xxay/ 


Finally, suppose that the line was 
text Xeecvecvcsevccsouesy text 


Can you see what trap lies in wait for the 
unwary? If you blindly type 


s/x.*y/Xqy/ 


what will happen? The answer, naturally, is that 
it depends. If there are no other x’s or y’s on 
the line, then everything works, but it’s blind 
luck, not good management. Remember that ‘.’ 
matches any single character? Then ‘.*’ matches 
as many single characters as possible, and unless 
you're careful, it can eat up a lot more of the 
line than you expected. If the line was, for 
example, like this: 


text x text text y text 


Xecccccccccccccsey 
then saying 
s/x.ty/xoy/ 


will take everything from the first ‘x’ to the /ast 
‘y’, which, in this example, is undoubtedly more 
than you wanted. 


The solution, of course, is to turn off the 
special meaning of ‘.’ with ‘\.’: 

s/x\.*y/xay/ 
Now everything works, for ‘\.*’ means ‘as many 
periods as possible’. 


There are times when the pattern ‘.*’ is 
exactly what you want. For example, to change 
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Now is the time for all good men .... 
into 
Now is the time. 
use ‘.*’ to eat up everything after the ‘for’: 
s/ for.*/./ 


There are a couple of additional pitfails 
associated with ‘»*’ that you should be aware of. 
Most notable is the fact that ‘as many as possi- 
ble’ means zero or more. The fact that zero is a 
legitimate possibility is sometimes rather surpris- 
ing. For example, if our line contained 

text xy text x y text 
and we said 

s/xa*y/xay/ 


the first ‘xy’ matches this pattern, for it consists 
of an ‘x’, zero spaces, and a ‘y’. The result is 
that the substitute acts on the first ‘xy’, and does 
not touch the later one that actually contains 
some intervening spaces. 


The way around this, if it matters, is to 
specify a pattern like 


lXxaqo*y/ 


which says ‘an x, a space, then as many more 


spaces as possible, then a y’. 


The other startling behavior of ‘*’ is again - 
related to the fact that zero is a legitimate 
number of occurrences of something followed by 
a star. The command 


s/x*/y/g 

when applied to the line 
abcdef 

produces 
yaybycydyeyfy 


which is almost certainly not what was intended. 
The reason for this behavior is that zero is a 
legal number of matches, and there are no x’s at 
the beginning of the line (so that gets converted 
into a ‘y’), nor between the ‘a’ and the ‘b’ (so 
that gets converted into a ‘y’), nor... and so on. 
Make sure you really want zero matches; if not, 
in this case write 


s/xx*/y/g 


‘xx*’ is one or more x’s. 


The Brackets ‘{ ]’ 


Suppose that you want to delete any 
numbers that appear at the beginning of all lines 
of a file. You might first think of trying a series 
of commands like 


1,$s/71+// 
1,$s/°2*// 
1,3s/°30// 


and so on, but this is clearly going to take for- 
ever if the numbers are at all long. Unless you 
want to repeat the commands over and over until 
finally all numbers are gone, you must get all the 
digits on one pass. This is the purpose of the 
brackets [ and ]. 


The construction 
{1234567890] 


matches any single digit — the whole thing is 
called a ‘character class’. With a character class, 
the job is easy. The pattern ‘(0123456789]~’ 
matches zero or more digits (an entire number), 
sO 


1,$8/°(1234567890]=// 


deletes all digits from the beginning of all lines. 


Any characters can appear within a charac- 
ter class, and just to confuse the issue there are 
essentially no special characters inside the brack- 
ets; even the backslash doesn’t have a special 
meaning. To search for special characters, for 


example, you can say 

/LAS*0/ 
Within (...], the ‘{’ is not special. To get a ‘|’ 
into a character class, make it the first character. 


As a final frill on character classes, you can 
specify a class that means ‘none of the following 
characters’. This is done by beginning the ciass 
with a ‘*’: . 


(“1234567890] 


stands for ‘any character except a digit’. Thus 
you might find the first line that doesn’t begin 
with a tab or space by a search like 


/*(* (space) (tab) ]/ 


Within a character class, the circumflex has 
a special meaning only if it occurs at the begin- 
ning. Just to convince yourseif, verify that 


t*}/ 
finds a line that doesn’t begin with a circumflex. 


The Ampersand ‘&’ 


The ampersand ‘&’ is used primarily to 
save typing. Suppose you have the line 


Now is the time 
and you want to make it 
Now is the best time 
Of course you can always say 
s/the/the best/ 


but it seems silly to have to repeat the ‘the’. 
The ‘&’ is used to eliminate the repetition. On 
the right side of a substitute, the ampersand 
means ‘whatever was just matched’, so you can 
Say 


s/the/& best/ 


and the ‘&’ will stand for ‘the’. Of course this 
isn’t much of a saving if the thing matched is 
just ‘the’, but if it is something truly long or 
awful, or if it is something like ‘.*’ which 
matches a lot of text, you can save some tedious 
typing. There is also much less chance of mak- 
ing a typing error in the replacement text. For 
example, to parenthesize a line, regardless of its 
length, 


s/.0/(&)/ 


The ampersand can occur more than once 
on the right side: 


s/the/& best and & worst/ 


makes 

Now is the best and the worst time 
and 

S/.9/&? &!!/ 
converts the original line into 


Now is the time? Now is the time!! 


To get a literal ampersand, naturally the 
backslash is used to turn off the special meaning: 


s/ampersand/\&/ 


converts the word into the symbol. Notice that 
‘&’ is not special on the left side of a substitute, 
only on the right side. 


Substituting Newlines 


ed provides a facility for splitting a single 
line into two or more shorter lines by ‘substitut- 
ing in a newline’. As the simplest example, sup- 
pose a line has gotten unmanageably long 
because of editing (or merely because it was 
unwisely typed). If it looks like 


fext xy text 


you can break it between the ‘x’ and the ‘y’ like 
this: 


s/xy/x\ 
y/ 


This is actually a single command, although it is 
typed on two lines. Bearing in mind that ‘\’ 
turns off special meanings, it seems relatively 
intuitive that a ‘\’ at the end of a line would 
make the newline there no longer special. 


You can in fact make a single line into 
several lines with this same mechanism. As a 
large example, consider underlining the word 
‘very’ in a long line by splitting ‘very’ onto a 
separate line, and preceding it by the roff or nroff 
formatting command ‘.ul’. 


text a very big text 
The command 


s/overya/\ 
eul\ 

very\ 

/ 


converts the line into four shorter lines, preced- 
ing the word ‘very’ by the line ‘.ul’, and elim- 
inating the spaces around the ‘very’, all at the 
same time. 


When a newline is substituted in, dot is 
left pointing at the last line created. 


Regrettably there is no way to go in the 
opposite direction: ed will not convert. two lines 
into one. 


Rearranging a Line with \(... \) 


(This section should be skipped on first 
reading.) Recall that ‘&’ is a shorthand that 
stands for whatever was matched by the left side 
of an s command. In much the same way you 
can capture separate pieces of what was matched; 
the only difference is that you have to specify on 
the left side just what pieces you’re interested in. 


Suppose, for instance, that you have a file 
of lines that consist of names in the form 


Smith, A. B. 
Jones, C. 


and so on, and you want the initials to precede 
the name, as in 


A. B. Smith 
C. Jones 


It is possible to do this with a series of editing 
commands, but it is tedious and error-prone. (It 
is instructive to figure out how it is done, 


though.) 


The alternative is to ‘tag’ the pieces of the 
pattern (in this case, the last name, and the ini- 
tials), and then rearrange the pieces. On the left 
side of a substitution, if part of the pattern is 
enclosed between \( and \), whatever matched 
that part is remembered, and available for use on 
the right side. On the right side, the symbol ‘\1° 
refers to whatever matched the first \(...\) pair, 
‘\2’ to the second \(...\), and so on. 


The command 
1L,Ss/\C0 Je\,aN\GDA2G\L/ 


although hard to read, does the job. The first 
\G..\) matches the last name, which is any string 
up to the comma; this is referred to on the right 
side with ‘\1’. The second \(...\) is whatever 
follows the comma and any spaces, and is 
referred to as ‘\2’. 


Of course, with any editing sequence this 
complicated, it’s foolhardy to simply run it and 
hope. The global commands g and y discussed 
in section 4 provide a way for you to print 
exactly those lines which were affected by the 
substitute command, and thus verify that it did 
what you wanted in all cases. 


3. LINE ADDRESSING IN THE EDITOR 


The next general area we wiil discuss is 
that of line addressing in ed, that is, how you 
specify what lines are to be affected by editing 
commands. We have already used constructions 
like 


1,$s/x/y/ 


to specify a change on all lines. And most users 
are long since familiar with using a single new- 
line (or return) to print the next line, and with 


/thing/ 


to find a line that contains ‘thing’. Less familiar, 
surprisingly enough, is the use of 


"thing? 


to scan backwards for the previous occurrence of 
‘thing’. This is especially handy when you reai- 
ize that the thing you want to operate on is back 
up the page from where you are currently edit- 
ing. 

The slash and question mark are the only 
characters you can use to delimit a context 
search, though you can use essentially any char- 
acter in a substitute command. 


Address Arithmetic 

The next step is to combine the line 
numbers like ‘.’, ‘$’, ‘/.../’ and ‘?...?” with ‘+? 
and ‘—’. Thus 

$—1 


is a command to print the next to last line of the 
current file (that is, one line before line ‘$°). 
For example, to recall how far you got in a previ- 
ous editing session, 


$—5,5p 


prints the last six lines. (Be sure you understand 
why it’s six, not five.) If there aren’t six, of 
course, you'll get an error message. 


As another example, 
~—3,.+3p 


prints from three lines before where you are now 
(at line dot) to three lines after, thus giving you 
a bit of context. By the way, the ‘+’ can be 
omitted: 


—_ 3,.3p 
is absolutely identical in meaning. 


Another area in which you can save typing 
effort in specifying lines is to use ‘“—’ and ‘+' as 
line numbers by themselves. 


by useif is a command to move back up one line 
in the file. In fact, you can string several minus 
signs together to move back up that many lines: 


—— ——- <—- 


moves up three lines, as does ‘—3’. Thus 
—3,+3p | 
is also identical to the examples above. 
Since ‘—’ is shorter than ‘.—1’, construc- 
tions like 
— ,.8/bad/good/ 


are useful. This changes ‘bad’ to ‘good’ on the 
previous line and on the current line. 


‘+’ and ‘=’ can be used in combination 
with searches using ‘/:../” and ‘?...?°, and with 
‘$°’. The search 


/thing/-— 


finds the line containing ‘thing’, and positions 
you two lines before it. 


Repeated Searches 
Suppose you ask for the search 


/horrible thing/ 


and when the line is printed you discover that it 
. isn’t the horrible thing that you wanted, so it is 
necessary to repeat the search again. You don’t 
have to re-type the search, for the construction 


// 


is a shorthand for ‘the previous thing that was 
searched for’, whatever it was. This can be 
repeated as many times as necessary. You can 
also go backwards: 


2? 
searches for the same thing, but in the reverse 
direction. 


Not only can you repeat the search, but 
you can use ‘//’ as the left side of a substitute 
command, to mean ‘the most recent pattern’. 


/horrible thing/ 
.... €d prints line with ‘horrible thing’ ... 
s//good/p 


To go backwards and change a line, say 
?2s//good/ 


Of course, you can still use the ‘&’ on the right 
hand side of a substitute to stand for whatever 
got matched: 


//s//&q&/p 


finds the next occurrence of whatever you 
searched for last, replaces it by two copies of 
itself, then prints the line just to verify that it 
worked. 


Default Line Numbers and the Value of Dot 


One of the most effective ways to speed up 
your editing is always to know what lines will be 
affected by a command if you don’t specify the 
lines it is to act on, and on what line you will be 
positioned (i.e., the value of dot) when a com- 
mand finishes. If you can edit without specifying 
unnecessary line numbers, you can save a lot of 
typing. 


As the most obvious example, if you issue 


a search command like 
/thing/ 


you are left pointing at the next line that con- 
tains ‘thing’. Then no address is required with 
commands like s to make a substitution on that 
line, or p to print it, or 1 to list it, or d to delete 
it, or a to append text after it, or ¢ to change it, 
or i to insert text before it. 


What happens if there was no ‘thing’? 
Then you are left right where you were — dot is 
unchanged. This is also true if you were sitting 
on the only ‘thing’ when you issued the com- 
mand. The same rules hold for searches that use 
*7...7"; the only difference is the direction in 
which you search. 


The delete command d leaves dot pointing 
at the line that followed the last deleted line. 
When line ‘3’ gets deleted, however, dot points 
at the new line ‘S’. 


The line-changing commands a, ¢c and i by 
default all affect the current line — if you give 
no line number with them, a appends text after 
the current line, ¢ changes the current line, and i 
inserts text before the current line. 


a, c, and i behave identically in one 
respect — when you stop appending, changing or 
inserting, dot points at the last line entered. 
This is exactly what you want for typing and edit- 
ing on the fly. For example, you can say 


a 
sae KOK. cos 

... botch ... (minor error) 
s/botch/correct/ (fix botched line) 
a 

... More text ... 


without specifying any line number for the sub- 
stitute command or for the second append com- 
mand. Or you can say 


a 
w. text... 
... horrible botch ... (major error) 


c (replace entire line) 
... fixed up line ... 


You should experiment to determine what 
happens if you add zo lines with a, c or i. 


The r command will read a file into the 
text being edited, either at the end if you give no 
address, or after the specified line if you do. In 
either case, dot points at the last line read in. 
Remember that you can even say Or to read a 
file in at the beginning of the text. (You can 
also say 0a or li to start adding text at the begin- 
ning.) 


The w command writes out the entire file. 
If you precede the command by one line 
number, that line is written, while if you precede 
it by two line numbers, that range of lines is 
written. The w command does zot change dot: 
the current line remains the same, regardless of 
what lines are written. This is true even if you 
say something like 


/“\.AB/,/“\.AE/w abstract 


which involves a context search. 


Since the w command is so easy to use, 
you should save what you are editing regularly as 
you go along just in case the system crashes, or 
in case you do something foolish, like clobbering 
what you're editing. 


The least intuitive behavior, in a sense, is 
that of the s command. The rule is simple — 
you are left sitting on the last line that matched 
the pattern. If there were no matches, then dot 
is unchanged. 


To illustrate, suppose that there are three 
lines in the buffer, and you are sitting on the 
middle one: 


xi 
x2 
x3 


Then the command 
—,+3/x/y/p 


prints the third line, which is the last one 
changed. But if the three lines had been 


and the same command had been issued while 
dot pointed at the second line, then the result 
would be to change and print only the first line, 
and that is where dot would be set. 


Semicolon ‘;’ 


Searches with ‘/.../’ and ‘?...?” start at the 
current line and move forward or backward 
respectively until they either find the pattern or 
get back to the current line. Sometimes this is 
not what is wanted. Suppose, for example, that 
the buffer contains lines like this: 


ab 


Starting at line t, one would expect that the 
command 


fa/,/b/p 


prints all the lines from the ‘ab’ to the ‘be’ 
inclusive. Actually this is not what happens. 
Both searches (for ‘a’ and for ‘b’) start from the 
same point, and thus they both find the line that 
contains ‘ab’. The result is to print a single line. 
Worse, if there had been a line with a ‘b’ in it 
before the ‘ab’ line, then the print command 
would be in error, since the second line number 
would be less than the first, and it is illegal to try 
to print lines in reverse order. 


This is because the comma separator for 
line numbers doesn’t set dot as each address is 
processed; each search starts from the same 
place. In ed, the semicolon ‘;* can be used just 
like comma, with the single difference that use 
of a semicolon forces dot to be set at that point 
as the line numbers are being evaluated. I[n 
effect, the semicolon ‘moves’ dot. Thus in our 
example above, the command 


/a/l:/b/p 


prints the range of lines from ‘ab’ to ‘be’, 
because after the ‘a’ is found, dot is set to that 
line, and then ‘b’ is searched for, starting beyond 
that line. 


This property is most often useful in a 
very simple situation. Suppose you want to find 
the -second occurrence of ‘thing’. You could say 


/thing/ 
Hf 


but this prints the first occurrence as well as the 
second, and is a nuisance when you know very 
well that it is only the second one you're 
interested in. The solution is to say 


/thing/;// 


This says to find the first occurrence of ‘thing’, 
set dot to that line, then find the second and 
print only that. 


Closely related is searching for the second 
previous occurrence of something, as in 


?something?;?? 


Printing the third or fourth or ... in either direc- 


tion is left as an exercise. 


Finally, bear in mind that if you want to 
find the first occurrence of something in a file, 
starting at an arbitrary place within the file, it is 
not sufficient to say 


1;/thing/ 


because this fails if ‘thing’ occurs on line 1. But 
it is possible to say 


0;/thing/ 


(one of the few places where 0 is a legal line 
number), for this starts the search at line 1. 


Interrupting the Editor 


As a final note on what dot gets set to, you 
should be aware that if you hit the interrupt or 
delete or rubout or break key while ed is doing a 
command, things are put back together again and 
your state is restored as much as possible to what 
it was before the command began. Naturally, 
some changes are irrevocable — if you are read- 
ing or writing a file or making substitutions or 
deleting lines, these will be stopped in some 
clean but unpredictable state in the middie 
(which is why it is not usually wise to stop 
them). Dot may or may not be changed. 


Printing is more clear cut. Dot is not 
changed unul the printing ts done. Thus if you 
print until you see an interesting line, then hit 
delete, you are of sitting on that line or even 
near it. Dot is left where it was when the p com- 
mand was started. 
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4. GLOBAL COMMANDS 


The global commands g and v are used to 
perform one or more editing commands on all 
lines that either contain (g) or don’t contain (v) 
a specified pattern. 


As the simplest example, the command 
g/UNIX/p 


prints all lines that contain the word UNIX. The 
pattern that goes between the slashes can be any- 
thing that could be used in a line search or in a 
substitute command; exactly the same rules and 
limitations apply. 


As another example, then, 
g/"\./p 


prints all the formatting commands in a file 
(lines that begin with ‘.°). 


The v command is identical to g, except 
that it operates on those line that do zor contain 
an occurrence of the pattern. (There is no 
mnemonic significance to the letter ‘v’.) So 


v/"\./p 


prints all the lines that don’t begin with ‘.’ — the 
actual text lines. 


The command that follows g or v can be 
anything: 


g/*\./d 

deletes all lines that begin with ‘.’, and 
g/°$/d 

deletes all empty lines. 


Probably the most useful command that 
can follow a giobal is the substitute command, 
for this can be used to make a change and print 
each affected line for verification. For example, 
we could change the word UNIX to ‘Unix' 
everywhere, and verify that it really worked, with 


g/UNIX/s//Unix/gp 


Notice that we used ‘//”’ in the substitute com- 
mand to mean ‘the previous pattern’, in this 
case, UNIX. The p command is done on every 
line that matches the pattern, not just those on 
which a substitution took place. 


The global command operates by making 
two passes over the file. On the first pass, ail 
lines that match the pattern are marked. On the 
second pass, each marked line in turn is exam- 
ined, dot is set to that line, and the command 
executed. This means that it is possible for the 
command that follows a g or y to use addresses, 
set dot, and so on quite freely. 


g/"\.PP/ + 


prints the line that follows each ‘.PP’ command 
(the signal for a new paragraph in some format- 
ting packages). Remember that ‘+’ means ‘one 
line past dot’. And 


g/topic/?°\.SH?1 


searches for each line that contains ‘topic’, scans 
backwards until it finds a line that begins ‘.SH’ 
(a section heading) and prints the line that fol- 
lows that, thus showing the section headings 
under which ‘topic’ is mentioned. Finally, 


g/*\.EQ/+,/°\.EN/—p 


prints all the lines that lie between ‘.EQ’ and 
‘EN’ formatting commands. 


The g and v commands can also be pre- 
ceded by line numbers, in which case the lines 
searched are only those in the range specified. 


Muiti-line Global Commands 


It is possible to do more than one com- 
mand under the control of a global command, 
although the syntax for expressing the operation 
is not especially natural or pleasant. As an 
example, suppose the task is to change ‘x’ to ‘y’ 
and ‘a’ to ‘b’ on ail lines that contain ‘thing’. 
Then 


g/thing/s/x/y/\ 
s/a/b/ 


is sufficient. The ‘\’ signals the g command that 
the set of commands continues on the next line; 
it terminates on the first line that does not end 
with ‘\’. (As a minor blemish, you can’t use a 
substitute command to insert a newline within a 
g command.) 


You should watch out for this problem: 
the command 


a/x/s//y/\ 
s/a/b/ 


does not work as you expect. The remembered 
pattern is the last pattern that was actually exe- 
cuted, so sometimes it will be ‘x’ (as expected), 
and sometimes it will be ‘a’ (not expected). You 
must spell it out, like this: 


g/x/s/x/y/\ 
s/a/b/ 


It is also possible to execute a, ¢ and i 
commands under a global command; as with 
other multi-line constructions, ail that is needed 
is to add a ‘\’ at the end of each line except the 
last. Thus to add a ‘.nf and ‘.sp’ command 


-ll- 


before each ‘.EQ’ line, type 


g/*\.EQ/i\ 
onf\ 


°Sp 


There is no need for a final line containing a ‘.’ 
to terminate the i command, unless there are 
further commands being done under the global. 
On the other hand, it does no harm to put it in 
either. 


5. CUT AND PASTE WITH UNIX COM- 
MANDS 


One editing area in which  non- 
programmers seem not very confident is in what 
might be called ‘cut and paste’ operations — 
changing the name of a file, making a copy of a 
file somewhere else, moving a few lines from 
one piace to another in a file, inserting one file in 
the middle of another, splitting a file into pieces, 
and splicing two or more files together. 


Yet most of these operations are actually 
quite easy, if you keep your wits about you and 
go cautiously. The next several sections talk 
about cut and paste. We will begin with the 
UNIX commands for moving entire files around, 
then discuss ed commands for operating on 
pieces of files. 


Changing the Name of a File 


You have a file named ‘memo’ and you 
want it to be called ‘paper’ instead. How is it 
done? 


The UNIX program that renames files is 
called mv (for ‘move’); it ‘moves’ the file from 
one name to another, like this: 


mv memo paper 


That’s ail there ts to it: my from the old name to 
the new name. 


mv oldname newname 


Warning: if there is already a file around with the 
new name, its present contents will be silently 
clobbered by the information from the other file. 
The one exception is that you can’t move a file 
to itself — 


mv xX X 


is illegal. 


Making a Copy of a File 


Sometimes what you want is a copy of a 
file — an entirely fresh version. This might be 
because you want to work on a file, and yet save 
a copy in case something gets fouled up, or just 


because you're paranoid. 


In any case, the way to do it is with the ep 
command. (cp stands for ‘copy’; UNIX is big on 
short command names, which are appreciated by 
heavy users, but sometimes a strain for novices.) 
Suppose you have a file called ‘good’ and you 
want to save a copy before you make some 
dramatic editing changes. Choose a name — 
‘savegood’ might be acceptable — then type 


cp good savegood 


This copies ‘good’ onto ‘savegood’, and you now 
have two identical copies of the file ‘good’. (If 
“savegood’ previously contained something, it 
gets overwritten.) 


Now if you decide at some time that you 
want to get back to the original state of ‘good’, 
you can say 


mv savegood good 


(if you’re not interested in ‘savegood’ anymore), 
or 


cp savegood good 


if you still want to retain a safe copy. 


In summary, mv just renames a file; cp 
makes a duplicate copy. Both of them clobber 
the ‘target’ file if it already exists, so you had 
better be sure that’s what.you want to do éefore 
you do it. 


Removing a File 


If you decide you are really done with a 
file forever, you can remove it with the rm com- 
mand: 


rm savegood 


throws away (irrevocably) the file called 


‘savegood’. 


Putting Two or More Files Together 


The next step is the familiar one of collect- 
ing two or more files into one big one. This will 
be needed, for example, when the author of a 
paper decides that several sections need to be 
combined into one. There are several ways to do 
it, of which the cleanest, once you get used to it, 
is a program called cat. (Not a// UNIX programs 
have two-letter names.) cat is short for ‘con- 
catenate’, which is exactly what we want to do. 


Suppose the job is to combine the files 
‘filel’ and ‘file2’ into a single file called ‘bigfile’. 
If you say 


cat file 


49'< 


the contents of ‘file’ will get printed on your ter- 
minal. If you say 


filel file2 


the contents of ‘filel’ and then the contents of 
‘file2’ will 50th be printed on your terminal, in 
that order. So cat combines the files, ail right, 
but it’s not much help to print them on the ter- 
minal — we want them in ‘bigfile’. 


cat 


Fortunately, there is a way. You can tell 
UNIX that instead of printing on your terminal, 
you want the same information put in a file. The 
way to do it is to add to the command line the 
character > and the name of the file where you 
want the output to go. Then you can say 


cat filel file2 >bigfile 


and the job is done. (As with ep and mv, you’re 
putting something into ‘bigfile’, and anything 
that was already there is destroyed.) 

This ability to ‘capture’ the output of a 
program is one of the most useful aspects of 
UNIX. Fortunately it’s not limited to the cat 
program — you can use it with any program that 
prints on your terminal. We'll see some more 
uses for it in a moment. 


Naturally, you can combine several files, 
not just two: 


cat filel file2 file3 ... 


callects a whole bunch. 


> bigfile 


Question: is there any difference between 
cp good savegood 

and 
cat good >savegood 


Answer: for most purposes, no. You might rea- 
sonably ask why there are two programs in that 
case, since cat is obviously all you need. The 
answer is that cp will do some other things as 
well, which you can investigate for yourseif by 
reading the manual. For now we'll stick to sim- 
ple usages. 


Adding Something to the End of a File 


Sometimes you want to add one file to the 
end of another. We have enough building blocks 
now that you can do it; in fact before reading 
further it would be valuable if you figured out 
how. To be specific, how would you use cp, mv 
and/or cat to add the file ‘good!’ to the end of 
the file ‘good’? 


You could try 


cat good good! >temp 
_ mv temp good 


which is probably most direct. You should also 
understand why 


cat good good! >good 


doesn’t work. (Don’t practice with a good 
‘good’!) 

The easy way is to use a variant of >, 
called >>. In fact, >> is identical to > except 
that instead of clobbering the old file, it simply 
tacks stuff on at the end. Thus you could say 


cat good! >>good 


and ‘good!’ is added to the end of ‘good’. (And 
if ‘good’ didn’t exist, this makes a copy of 
‘good1’ called ‘good’.) 

6. CUT AND PASTE WITH THE EDITOR 


Now we move on to manipulating pieces 
of files — individual lines or groups of lines. 
This is another area where new users seem 
unsure of themselves. 


Filenames 


The first step is to ensure that you know 
the ed commands for reading and writing files. 
Of course you can’t go very far without knowing 
rand w. Equally useful, but less well known, is 
the ‘edit’ command e. Within ed, the command 


e newfile 


says ‘I want to edit a new file called newfie, 
without leaving the editor.’ The e command dis- 
cards whatever you're currently working on and 
Starts over on new/file. It’s exactly the same as if 
you had quit with the q command, then re- 
entered ed with a new file name, except that if 
you have a pattern remembered, then a com- 
mand like // will still work. 


If you enter ed with the command 
ed file 


ed remembers the name of the file, and any sub- 
sequent e, r or w commands that don’t contain a 
filename will refer to this remembered file. Thus 


ed file! 

... (editing) ... 

w (writes back in filel) 

e file2 (edit new file, without leaving editor) 
.. (editing on file2) ... 

w (writes back on file2) 


(and so on) does a series of edits on various files 
without ever leaving ed and without typing the 


ts 


name of any file more than once. 


You can find out the remembered file 
name at any time with the f command; just type 
f without a file name. You can also change the 
name of the remembered file name with f; a use- 
ful sequence is 


ed precious 
f junk 
... (editing) ... 


which gets a copy of a precious file, then uses f 
to guarantee that a careless w command won't 
clobber the original. 


Inserting One File into Another 


Suppose you have a file called ‘memo’, 
and you want the file called ‘table’ to be inserted 
just after the reference to Table 1. That is, in 
‘memo’ somewhere is a line that says 


Table | shows that ... 


and the data contained in ‘table’ has to go there, 
probably so it will be formatted properly by nroff 
or troff. Now what? 


This one is easy. Edit ‘memo’, find ‘Table 
1°, and add the file ‘table’ right there: 


ed memo 

/Table 1/ 

Tabie ! shows that... [response from ed/ 
.f table 


The critical line is the last one. As we said ear- 
lier, the r command reads a file; here you asked 
for it to be read in right after line dot. An r 
command without any address adds lines at the 
end, So it is the same as Sr. 


Writing out Part of a File 


The other side of the coin is writing out 
part of the document you’re editing. For exam- 
ple, maybe you want to split out into a separate 
file that table from the previous example, so it 
can be formatted and tested separately. Suppose 
that in the file being edited we have 


1S 
... [lots of stuff] 
TE 


which is the way a table is set up for the tbi pro- 
gram. To isolate the table in a separate file 
called ‘table’, first find the start of the table (the 
‘TS’ line), then write out the interesting part: 


/*\.TS/ 
.TS fed prints the line it found] 
.,/°\. TE/w table 


and the job is done. If you are confident, you 
can do it all at once with 


— {\.TS/:/°\. TE/w table 


The point is that the w command can write 
out a group of lines, instead of the whole file. In 
fact, you can write out a single line if you like, 
just give one line number instead of two. For 
example, if you have just typed a horribly com- 
plicated line and you know that it (or something 
like it) is going to be needed later, then save it 
— don’t re-type it. In the editor, say 


a 
...lots of stuff... 

..norribte line... 
-w temp 

a 

ssemore stuff... 
er temp 

a 

e«emore stuff... 


This last example is worth studying, to be sure 
you appreciate what’s going on. 


Moving Lines Around 


Suppose you want to move a paragraph 
from its present position in a paper to the end. 
How would you do it? As a concrete example, 
suppose each paragraph in the paper begins with 
the formatting command ‘.PP’. Think about it 
and write down the details before reading on. 


The brute force way (not necessarily bad) 
is to write the paragraph onto a temporary file, 
delete it from its current position, then read in 
the temporary file at the end. Assuming that 
you are sitting on the ‘.PP’ command that begins 
the paragraph, this is the sequence of commands: 


.,/°\.PP/—w temp 
.//—d 
Sr temp 


That is, from where you are now (‘.’) until one 
line before the next ‘.PP’ (‘/*\.PP/-’) write 
onto ‘temp’. Then delete the same _ lines. 
Finally, read ‘temp’ at the end. 


As we said, that’s the brute force way. 
The easier way (often) is to use the move com- 
mand m that ed provides — it lets you do the 
whole set of operations at one crack, without any 
temporary file. 


Te 


‘marks the current line with the name ‘x’. 


The m command is like many other ed 
commands in that it takes up to two line 
numbers in front that tell what lines are to be 
affected. It is also followed by a line number that 
tells where the lines are to go. Thus 


linel, line2 m line3 


says to move all the lines between ‘linel’ and 
‘line2’ after ‘line3’. Naturally, any of ‘linel’ 
etc., can be patterns between slashes, $ signs, or 
other ways to specify lines. 


Suppose again that you’re sitting at the 
first line of the paragraph. Then you can say 


o,/\.PP/ = m$ 


That’s all. 


As another example of a frequent opera- 
tion, you can reverse the order of two adjacent 
lines by moving the first one to after the second. 
Suppose that you are positioned at the first. 
Then 


m+ 


does it. It says to move line dot to after one line 
after line dot. 


As you can see, the m command is more 
succinct and direct than writing, deleting and re- 
reading. When is brute force better anyway? 
This is a matter of personal taste — do what you 
have most confidence in. The main difficulty 
with the m command is that if you use patterns 
to specify both the lines you are moving and the 
target, you have to take care that you specify 
them properly, or you may well not move the 
lines you thought you did. The result of a 
botched m command can often be a mess. 
Doing the job a step at a time makes it easier for 
you to verify at each step that you accomplished 
what you wanted to. It’s also a good idea to 
issue a w command before doing anything com- 
plicated; then if you goof, it’s easy to back up to 
where you were. 


Marks 


ed provides a facility for marking a line 
with a particular name so you can later reference 
it by name regardless of its actual line number. 
This can be handy for moving lines, and for 
keeping track of them as they move. The mark 
command is k; the command 


kx 


If a 
line number precedes the k, that line is marked. 
(The mark name must be a single lower case 
letter.) Now you can refer to the marked line 


with the address 


x 


Marks are most useful for moving things 
around. Find the first line of the block to be 
moved, and mark it with @. Then find the last 
line and mark it with ‘6. Now position yourself 
at the place where the stuff is to go and say 


‘a, ‘bm. 

Bear in mind that only one line can have a 
particular mark name associated with it at any 
given time. 


Copying Lines 


We mentioned earlier the idea of saving a 
line that was hard to type or used often, so as to 
cut down on typing time. Of course this could 
be more than one line; then the saving is 
presumably even greater. 


ed provides another command, called t 
(for ‘transfer’) for making a copy of a group of 
one or more lines at any point. This is often 
easier that writing and reading. 


The t command is identical to the m com- 
mand, except that instead of moving lines it sim- 
ply duplicates them at the place you named. 
Thus 


1,3¢S 


duplicates the entire contents that you are edit- 
ing. A more common use for t is for creating a 
series of lines that differ only slightly. For 
example, you can say 


a 

ree X ws. (long line) 

te (make a copy) 

s/x/y/ (change it a bit) 

te (make third copy) 

s/y/2z/ (change it a bit) 
and so on. 


The Temporary Escape ‘!’ 


Sometimes it is convenient to be able to 
temporarily escape from the editor to do some 
other UNIX command, perhaps one of the file 
copy or move commands discussed in section 5, 
without leaving the editor. The ‘escape’ com- 
mand ! provides a way to do this. 


If you say 
lany UNIX command 


your current editing state is suspended, and the 
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_ by printing another !; 


UNIX command you asked for is executed. 
When the command finishes, ed will signal you 
at that point you can 
resume editing. 


You can really do any UNIX command, 
including another ed. (This is quite common, in 
fact.) In this case, you can even do another !. 


7. SUPPORTING TOOLS 


There are several tools and techniques that 
go along with the editor, all of which are rela- 
tively easy once you know how ed works, 
because they are all based on the editor. In this 
section we will give some fairly cursory exampies 
of these tools, more to indicate their existence 
than to provide a complete tutoriai. More infor- 
mation on each can be found in (3}. 


Grep 


Sometimes you want to find — ail 
occurrences of some word or pattern in a set of 
files, to edit them or perhaps just to verify their 
presence or absence. It may be possible to edit 
each file separately and look for the pattern of 
interest, but if there are many files this can get 
very tedious, and if the files are big (more than 
three or four thousand lines) it is impossible 
because of limits in ed. 


The program grep was invented to get 
around these limitations. The patterns that we 
have described in the paper are often called ‘reg- 
ular expressions’, and ‘grep’ stands for 


g/re/p 


That describes exactly what grep does — it prints 
every line in a set of files that contains a particu- 
lar pattern. Thus 


grep ‘thing’ filel file2 file3 ... 


finds ‘thing’ wherever it occurs in any of the files 
‘filel’, ‘file2’, etc. grep also indicates the file in 
which the line was found, so you can later edit it 
if you like. 


The pattern represented by ‘thing’ can be 
any pattern you can use in the editor, since grep 
and ed use exactly the same mechanism for pat- 
tern searching. It is wisest always to enclose the 
pattern in the single quotes ’...' if it contains any 
non-alphabetic characters, since many such char- 
acters also mean something special to the UNIX 
command interpreter (the ‘shell’). If you don’t 
quote them, the command interpreter will try to 
interpret them before grep gets a chance. 


There is also a way to find lines that don’t 
contain a pattern: 


grep —v ‘thing’ filel file2 ... 


finds all fines that don’t contains ‘thing’. The 
—y must occur in the position shown. Given 
grep and grep —vy, it is possible to do things like 
selecting ail lines that contain some combination 
of patterns. For example, to get all lines that 
contain ‘x’ but not ‘y’: 


grep x file... | grep —v y 


(The notation | is a ‘pipe’, which causes the out- 
put of the first command to be used as input to 
the second command; see [2].) 


Editing Scripts 

If a fairly complicated set of editing opera- 
tions is to be done on a whole set of files, the 
easiest thing to do is to make up a ‘script’, i.¢., a 
file that contains the operations you want to per- 
form, then apply this script to each file in turn. 


For example, suppose you want to change 
every UNIX to Unix and every GCOS to Geos in 
a large number of files. Then put into the file 
‘script’ the lines 


g/UNIX/s//Unix/g 
g/GCOS/s//Gcos/g 


Ww 
q 


Now you can say 


ed filel <script 
ed file2 <script 


This causes ed to take its commands from the 
prepared script. Notice that the whole job has to 
be planned in advance. 


And of course by using the UNIX com- 
mand interpreter, you can cycle through a set of 
files automatically, with varying degrees of ease. 


Sed 


sed (‘stream editor’) is a version of the 
editor with restricted capabilities but which ts 
capable of processing unlimited amounts of 
input. Basically sed copies its input to its output, 
applying one or more editing commands to each 
line of input. 


As an example, suppose that we want to 
do the UNIX to Unix part of the example given 
above, but without rewriting the files. Then the 
command 


sed ‘s/UNIX/Unix/g’ filel file2 ... 


applies the command ‘s/UNIX/Unix/g’ to all 
lines from ‘filel’, ‘file2’, etc., and copies all lines 
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to the output. The advantage of using sed in 
such a case is that it can be used with input too 
large for ed to handle. and that all the output 
can be collected in one place, either in a file or 
perhaps piped into another program. 

If the editing transformation is so compli- 
cated that more than one editing command is 
needed, commands can be supplied from a file, 
or on the command line, with a slightly more 
complex syntax. To take commands from a file, 
for example, 


sed ~—f cmdfile input — files... 


sed has further capabilities, including con- 
ditional testing and branching, which we canno 
go into here. 
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The command language for PWB/UNIX* is a high-level programming language that is an extended ver- 
sion of the UNIX Shell. By utilizing the Shell as a programming language, one can eliminate much of 
the programming drudgery that often accompanies a large project. Many manual procedures can be 
quickly, cheaply, and conveniently automated. Because it is so easy to create and use Shell procedures, 
individual users and entire projects can customize the general pwB/UNIX environment into one tailored 
to their own respective requirernents, organizational structure, and terminology. 


This paper is actually a combination of several tutorials, as explained in {1}.! Some sections provide a 
basic tutorial for relatively new users. Other sections are intended for more experienced users and 
introduce them to Shell programming. Finally, some hints on programming techniques and efficiency 
are offered for those who make especially heavy use of Shell programming. 


The accuracy of this tutorial is guaranteed only for the Shell of pwB/UNIX—Edition 1.0. Other versions 
of UNIX have other Shells. Although many of the basic concepts are similar, there exist many 
differences in features, especially those used to support Shell programming. 


1. INTRODUCTION 


In any programming project, some effort is used to build the end product. The remainder is consumed 
in building the supporting tools and procedures used to manage and maintain the end product. The 
second effort can far exceed the first, especially in larger projects. A good command language can be 
an invaluable tool for such projects. If it is a flexible programming language, it can be used to solve 
many internal support problems, without requiring compilable programs to be written, debugged, and 
maintained; its most important advantage is the ability to get the job done now. For a perspective on 
the motivations for using a command language in this way, see [1,2,6]. 


When users log into a PWB/UNIX system, they communicate with an instance of the Shell that reads 
commands typed at the terminal and arranges for their execution. Thus, the Shell’s most important 
function is to provide a good interface for human beings. In addition, a sequence of commands may be 
preserved for later use by saving them in a file, called a Sheil procedure, a command file, or a runcom, 
according to local preferences. 


Some users need little knowledge of the Shell to do their work; others choose to make heavy use of its 
programming features. This tutorial may be read in several different ways, depending on the reader’s 
interests. A brief discussion of the pWB/UNIX environment is found in {2}. The discussion in {3} cov- 
ers aspects of the Sheil that are important for everyone, while all of {4} and most of {5} are mainly of 
interest to those who write Shell procedures. A group of annotated Shell procedures is given in {6}. 
Finally, a brief discussion of efficiency is offered in {7}. This is found in its proper place (the end), and 
is intended for those who write especially time-consuming Shell procedures. 


Compiete beginners should zor be reading this tutorial, but should work their way through other avail- 
able tutorials first. See (7] for an appropriate plan of study. All the commands mentioned below are 
described in Section I of the Pwayunix User’s Manual [3], while system calls are described in Section II 
and subroutines in Section III thereof. 


2. OVERVIEW OF THE UNIX ENVIRONMENT 


Full understanding of some later discussions depends on familiarity with PwB/UNIX; [9] is most useful 
for that, and it would be helpful to read at least one of [4,5,10]. For completeness, a short overview of 
the most relevant concepts is given below. 


* UNIX is a Trademark/Service Mark of the Bell System. 
1. The notation {7} refers to Section vn of this tutorial. 


2.1 File System 


The PWB/UNIX file system’s overall structure is that of a rooted tree composed of directories and other 
files. A file name is a sequence of characters. A pathname is a sequence of directory names followed by 
a file name, each separated from the previous one by a slash (/). If a pathname begins with a ‘‘/’’, the 
search for the file begins at the root of the entire tree; otherwise, it begins at the user’s current directory 
(also known as the working directory). (The first kind of name is often called a full pathname because it 
is invariant with regard to the user’s current directory.) The user may change the current directory at 
any time by using the cd or chdir command. In most cases, a file name and its corresponding pathname 
may be used interchangeably. Some sample names are: 


name of the current directory. 


ve name of the parent directory of the current directory. 
/ root directory of the entire file structure. 
/bin directory containing most of the frequently-used public commands. 


/al/tf/jtb/bin a full pathname typical of multi-person programming projects. This one happens to 
be a private directory of commands belonging to person “‘jtb’’ in project ‘‘tf’’; ‘‘al’’ 
is the name of a file system. 


bin/umiail a mame depending on the current directory: it names file “‘umail’’ in subdirectory 
‘‘hin’? of the current directory. If the current directory is ‘‘/’’, it names 
‘“/bin/umail’®. If the current directory is ‘‘/al/tf/jtb’’, it names 
**/al/tf/jtb/bin/umail’’. 


memox name of a file in the current directory. 
2.2 Processes 
wy” Beginners should skip this section on first reading. 


An image is a computer execution environment, including memory image, register values, current 
directory, status of open files, information recorded at login time, and various other items. A process is 
the execution of an image; most PWB/UNIX commands execute as separate processes. One process may 
spawn another using the fork system cali, which duplicates the image of the original (parent) process. 
The new (child) process continues to execute the same program as the parent. The two images are 
identical, except that the program can determine whether it is executing as parent or child. The pro- 
gram may continue execution of the image or may abandon it by issuing an exec system call, thus ini- 
tiating execution of another program. In any case, each process is free to proceed in parallel with the 
other, although the parent quite commonly issues a wa/t system call to suspend execution until a child 
exits. 


FORK WAIT 


(ASLEEP) 


PROGRAM A 


PROCESS 1 


e8ee8 88 @& 


PROGRAM A PROGRAM 8 


PROCESS 2 


EXIT 


Figure 1 


Figure | illustrates these ideas. Program A is executing (as process 1) and wishes to run program B. It 
‘‘forks’’ and spawns a child (process 2) that continues to execute program A. The child abandons A by 
executing 8, while the parent goes to sleep until the child exits. 


A child inherits its parent’s open files. This mechanism permits processes to share a common input 
stream in various ways. In particular, an open file possesses a pointer that indicates a position in the file 
and is modified by various operations. Read and write system calls copy a requested number of bytes 
from or to a file, beginning at the position given by the current value of the pointer. As a side effect, 
the pointer is incremented by the number of bytes transferred, yielding the effect of sequential [/O. 
Seek can be used to obtain random-access I/O; it sets the pointer to an absolute position within the file, 
or to a position offset either from the end of the file or from the current pointer position. 


When a process terminates, it can set an eight-bit return code (or exit code) that is available to its 
parent. This code is usually used to indicate success or failure. 


Signals indicate the occurrence of events that may have some impact on a process. A signal may be 
sent to a process by another process, from the keyboard, or by PWB/UNIX itself. For most types of sig- 
nals, a process can arrange to be terminated on receipt of a signal, to ignore it completely, or to 
‘‘catch’’ it and take appropriate action {4.6}. For example, an interrupt signal may be sent by depressing 
an appropriate key (‘‘del’’, ‘‘break’’, or ‘‘rubout’’). The action taken depends on the requirements of 
the specific program being executed: 


e The Shell invokes most commands in such a way that they immediately die when an interrupt is 
received. For example, pr normally dies, allowing the user to terminate unwanted output. 

e The Sheil itself ignores interrupts when reading from the terminal, because it should continue exe- 
cution even when the user terminates a command like pr. 

e The editor ed chooses to ‘‘catch’’ interrupts so that it can halt its current action (especially printing) 
without terminating completely. 


Limiting interprocess communication to a smail number of well-defined methods is a great aid to uni- 
formity, understandability, and reliability of programs. It encourages the “‘packaging’’ of each function 
into a small program that is easily connected to other programs, but depends very little on the internal 
workings of other programs. 


3. SHELL BASICS 


The Shell (i.e., the sh command) implements the command language visible to most PWB/UNIX users. 
It reads input from a terminal or a file and arranges for the execution of the requested commands. It is 
a small program (about forty pages of C code); many of its functions are actually provided by indepen- 
dent programs that work with it. It is not part of the operating system, but is an ordinary user program. 
The discussion below is adapted from [10,11]. 


3.1 Commands 


A command is a sequence of non-blank arguments separated by blanks or tabs. The first argument 
(numbered zero) specifies the name of the command to be executed; any remaining arguments are 
passed as character-strings to the command executed. A command may be as simple as: 


who 


which prints the login names of logged-in users. The following line requests the pr command to print 
files a, b, and c: 


prabc 


If the first argument names a file that is executable* and is actually a load module, the Shell (as parent) 
spawns a new (child) process that immediately executes that program. If the file is marked executable, 
but is neither a load module nor a directory, it is assumed to be a Sheil procedure, i.¢., a file of ordinary 
text containing Shell command lines and possibly lines to be read by other programs. In this case, the 
Shell spawns a new instance of itself to read the file and execute the commands included in it. The fol- 
lowing command requests that the on-line Pwaunix User’s Manual [3] pages for the who and pr com- 
mands be printed on the terminal (the man command is actually implemented as a Shell procedure): 


man who pr 


2. As evidenced by an appropriate set of permission bits associated with that file. 


a 


From the user’s viewpoint, executable programs and Shell procedures are invoked in exactly the same 
way. The Shell determines which implementation has been used, rather than requiring the user to do 
so. This preserves the uniformity of invocation and the ease of changing the implementation choice for 
a ay: — The actions of the Sheil in executing any of these commands are illustrated in Fig- 
ure | {2.2}. 


3.2 Redirection of Standard Input and Output 


When a command begins execution, it usually expects that three files are already open, a ‘‘standard 
input’, a ‘‘standard output’’, and a “‘diagnostic output’’. When the user’s original Shell is started, all 
three have already been opened to the user’s terminal. A child process normally inherits these files 
from its parent. The Shell permits them to be redirected elsewhere before control is passed to an 
invoked command. 


An argument to the Shell of the form ‘‘<file’’ or ‘‘>file’’ opens the specified file as standard input or 
output, respectively. An argument of the form ‘‘>>file’’ opens the standard output to the end of the 
file, thus proyiding a way to append data to it. In either output case, the Shell creates the file if it did 
not already exist. The following appends to file “‘log’’ the list of users who are logged in: 


who >>log 


In general, most commands neither know nor care whether their input (output) is coming from (going 
to) a terminal or file. Thus, commands can be used conveniently in many different contexts. A few 
commands vary their actions depending on the nature of their input or output, either for efficiency’s 
sake, or to avoid useless actions (such as attempting random-access I/O on a terminal). 


Redirection of the diagnostic output is discussed in {4.7.3}. 
3.3 Command Lines 


A sequence of commands separated by ‘‘|”’ (or ‘“‘*”’’) make up a pipeline. Each command is run as a 
separate process connected to its neighbor(s) by pipes, i.e., the output of each command (except the last 
one) becomes the input of the next command in line. A /ilter is'a command that reads its input, 
transforms it in some way, then writes it as output. A pipeline normally consists of a series of filters. 
Although the processes in a pipeline are permitted to execute in parallel, they are synchronized to the 
extent that each program needs to read the output of its predecessor. Many commands operate on indi- 
vidual lines of text, reading a line, processing it, writing it, and looping back for more input. Some 
must read larger amounts of data before producing output; sort is an example of the extreme case that 
requires all input to be read before any output is produced. 


The following is an example of a typical pipeline: nroff is a text formatter whose output may contain 
reverse line motions; co/ converts these motions to a form that can be printed on a terminal lacking 
reverse motion capability; reform is used here to speed printing by converting the (tab-less) output of 
col to an equivalent one containing horizontal tab characters. The flag “‘“-mm/’’ indicates one of the 
more-commonly used formatting options, and ‘“‘text’’ is the name of the file to be formatted: 


nroff —mm text | col | reform 


Figure 2 shows the sequence of actions that set up this pipeline. Not shown are actions by the Sheil 
that create pipes and manipulate open files, causing the commands to be tied together correctly. 


A command line consists of zero or more pipelines separated by sernicolons or ampersands. If the last 
command in a pipeline is terminated by a semicolon (;) or a new-line character, the Shell waits for the 
command to finish before continuing to read command lines. It does not wait if the pipeline is ter- 
minated by an ampersand (&); both sequential and asynchronous execution are thus allowed. An asyn- 
chronous pipeline continues execution until it terminates voluntarily, or until its processes are killed. 
The first example below executes who, waits for it to terminate, and then executes date; the second 
invokes both commands in order, but does not wait for either one to finish. Figure 3 shows the actions 
of the Shell involved in executing these exampies: 


who >log; date 
who >log& date& 


SH FORK FORK FORK WAIT WAIT WAIT 


— y—b-zczs—plb- === —plb-———--p— 


(ASLEEP) (ASLEEP) : (ASLEEP) 
: EXEC > REFORM 
REFORM 
: EXIT 
; EXEC : COL 
COL 
: EXIT 
, EXEC NROFF 
NROFF 
, EXIT 
Figure 2 
SH FORK WAIT FORK WAIT 
(ASLEEP) (ASLEEP) : 
2,3 
EXIT EXIT 
SH FORK FORK (FREE TO 00 OTHER COMMANDS) 
1 
EXEC DATE 
3 DATE 
EXIT 
WHO 
, EXEC 
WHO 
EXIT 
Figure 3 


More typical uses of ‘“‘&’’ include off-line printing, background compilation, and generation of jobs to 
be sent to other computers. For example: 


nohup cc prog.c& 
You continue working while the C compiler runs in background. 


A command terminated by “‘&’’ is immune to interrupts, but it is wise to make it immune to hang-ups 
as well. The nofup command is used for this purpose. Without nofup, if you hang up while cc (the C 
compiler) is still executing, cc will be killed and your output will disappear. 


we” The “&”’ operator should be used with restraint, especially on heavily-loaded systems. Other users will 
not consider you a good citizen if you start up a large number of simultaneous, asynchronous processes 
without a compelling reason for doing so. 


A simple command in a pipeline may be replaced by a command line enclosed in parentheses ‘‘()’’; in 
this case, another instance of the Shell is spawned to execute the commands so enclosed. This action is 


¢ 


S68 


helpful in combining the output of several sequentially executed commands into a stream to be pro- 
cessed by a pipeline. The following line prints two separate documents in a way similar to that shown 
in a previous example: 


(nroff —mm textl; nroff —mm text2) | col | reform 
3.4 Generation of Argument Lists 


Many command arguments are names of files. When certain characters are found in an argument, they 
cause replacement of that argument by a sorted list of zero or more file names obtained by pattern- 
matching on the contents of a directory. Most characters match themselves. The ‘‘?”? matches any one 
character; the ‘‘*’? matches any string of any characters (other than ‘‘/’’), including the nuil string. 
Enclosing a set of characters within square brackets ‘‘{...]°’ causes the construct to match any one 
character in that set.2 Inside square brackets, a pair of characters separated by ‘‘—”’ includes in the set 
all characters lexically within the inclusive range of that pair. 


For example, ‘‘*’’ matches ail files in the current directory, ‘“‘*tmpe*’’ matches all names containing 
‘“‘tmp’’, ‘‘{a—f]*"’ matches ail files whose names begin with ‘‘a’”’ through ‘‘f’’, ‘‘*.c’’ matches all files 
ending in ‘‘.c’’, and ‘‘/al/tf/bin/?’’ matches all single-character names found in ‘‘/al/tf/bin’’. This 
capability saves much typing, and more importantly, makes it possible to organize information as large 
collections of small files that are named in disciplined ways. 
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Pattern-matching has several restrictions. If the first character of a file name is ‘‘.’’, it can be matched 
only by an argument that begins with ‘*.’’. Pattern-matching is currently restricted to the last com- 
ponent in a pathname—the string ‘‘/al/tf/*’’ is legal, but the string ‘‘/al/*/bin’’ is not. Pattern- 
matching does not apply to the name of the invoked command (i.e., argument number 0). 


3.5 Quoting Mechanisms 


If a character has a special meaning to the Shell, that meaning may be removed by preceding the char- 
acter with a back-slash (\); the ‘‘\’’ acts as an escape and disappears. A ‘‘\’’ followed by a new-line 
character is treated as a blank, permitting continuation of commands on additional input lines. A 
sequence of characters enclosed in single quotes (°...°) is taken literally—‘‘what you see is what you 
get’’. The beginner should use single quotes in most instances. Double quotes ("...") are required in 
a few cases, primarily inside Shell procedures. Double quotes hide the significance of most special 
characters, but allow substitution of Shell arguments and variables; see {4.8} for further details. 


3.6 Examples 


The following examples illustrate the variety of effects that can be obtained by combining a few com- 
mands in the ways described above. It may be helpful to try these examples at a terminal: 


e who 
Print (on the terminal) the list of logged-in users. 


e who >>log 
Append the list of logged-in users to the end of file “‘log’’. 


e who | we —l 
Print the number of logged-in users. (The argument to we is ‘‘minus ell’’.) 


e who | pr 
Print a paginated list of logged-in users. 


e who | sort 
Print an alphabetized list of logged-in users. 


e who | grep pw 
Print the list of logged-in users whose login names contain ‘‘pw’’. 


e who | grep pw | sort | pr 
Print an alphabetized, paginated list of logged-in users whose names contain ‘“‘pw’’, 


3. Be warned that square brackets are also used below in an entirely different sense: in descriptions of commands, they indicate 
that the enclosed argument is optional. 


e (date; who | we —l) >>log | 
Append (to “‘log’’) the current date followed by the count of logged-in users. 


e who | sed ‘s/ .*//" | sort | uniq —d 
Print only the login names of all users who are logged in more than once. 


The who command does not dy itseif provide options to yield all these results—they are obtained by 
combining it with other commands. The kinds of operations illustrated above may be used in other cir- 
cumstances, who just serves as the data source in these examples. As an exercise, replace ‘‘who |’ by 
‘*</etc/passwd”’ in the above examples to see how a file can be used as a data source in the same way. 


3.7 How the Shell Finds Commands 


The Shell normally searches for commands in a way that permits them to be found in three distinct 
locations in the file structure. The Shell first attempts to use the command name as given; if this fails, 
it prepends the string ‘‘/bin/’’ to the name, and, finally, ‘‘/usr/bin/’’. The effect is to search, in order, 
the current directory, ‘‘/bin’’, and ‘‘/usr/bin’’. For example, the pr and man commands are actually 
located in files ‘‘/bin/pr’’ and ‘‘/usr/bin/man’’, respectively. A more complex pathname may be 
given, either to locate a file relative to the user’s current directory, or to access a command via an abso- 
lute pathname. If a command name as given contains a ‘‘/”’ (e.g., ‘‘/bin/sort’’ or ‘‘../cmd’’), the 
prepending is not performed. Instead, a single attempt is made to execute the unmodified command 
name. 


This mechanism gives the user a convenient way to execute public commands and commands in or 
‘near’ the current directory, as weil as the ability to execute any accessible command regardless of its 
location in the file structure. Because the current directory is usually searched first, anyone can possess 
a private version of a public command without interfering with other users. Similariy, the creation of a 
new public command will not affect a user who already has a private command with the same name. 
This mechanism may be overridden {4.4}. 


3.8 Changing the State of the Shell and the -profile File 


The state of a given instance of the Shell may be altered in various ways. The following commands are 
used more often at the terminal than in Shell procedures. 


The cd command (or its synonym chAdir) changes the current directory of the Shell to the one specified. 
This can (and should) be used to change to a convenient place in the directory structure; cd is often 
combined with ‘‘()’’ to cause a sub-Shell to change to a different directory and execute some com- 
mands, without affecting the original Shell. The first sequence below extracts the component files of 
the archive file ‘‘/al/tf/q.a’’ and places them in whatever directory is the current one; the second 
places them in directory ‘‘/al/tf”’: 


ar x /al/tf/q.a 
(cd /al/tf; ar x q.a) 


The opt command sets various flags in the Shell. For example, “‘opt —p prompt-str’’ changes the 
Shell’s interactive prompt sequence from ‘““% ”’ to prompt-sir.4 Typing ‘‘opt —v’’ causes the Sheil to 
enter verbose mode, in which it prints each command line before executing it {4.1}. Try this at the ter- 
minal to see how the Shell scans arguments. The output can be turned off by typing “‘opt +v’’. 


The /ogin command causes the Shell to execute the /ogin program directly, permitting a new login 
without re-dialing. A related command is su, which permits you to act with someone else’s access per- 
missions without making you login again. 


Wait causes the Shell to suspend execution until all of its child processes have terminated. It is used to 
assure termination of asynchronous processes. 


When you login or use su, the Shell is invoked to read your commands, but if your current directory 
contains a file named ‘“‘.profile’’, the Shell reads it before reading commands from your terminal, 
‘*.profile’’ often contains commands that set tab stops and terminal delays, read mail, etc. See 
‘* profile’ in {6}. 


4. The default prompt string ““% °° is inconvenient for certain display (crt) terminalis. 
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4. USING THE SHELL AS A COMMAND: SHELL PROCEDURES 
4.1 Invoking the Shell 
The Shell is an ordinary command and may be invoked in the same way as other commands: 


sh file [ args ] A new instance of the Shell is explicitly invoked to read file. Arguments, if any, 
can be manipulated as described in {4.2}. 


sh —v file [ args ] This is equivalent to putting ‘‘opt —v’’ at the beginning of file. Each command 
line in file is printed before it is executed, thus tracing the progress of execution. 
This is an important debugging aid. 


file { args ] If file is marked executable, and is neither a directory nor a load module, the 
effect is that of ‘‘sh file [ args ]"’, except that file may be found by the search 
procedure described in {3.7}. 


4.2 Passing Arguments to the Sheil 


When a command line is scanned, any character sequence of the form $z is replaced by the nth argu- 
ment to the Shell, counting the name of the file being read as $0. This notation permits direct refer- 
ence to the file name and up to 9 arguments. Additional arguments can be processed using the shift 
command. It shifts arguments to the left; i.e., the value of $1 is thrown away, $2 replaces $1, $3 
replaces $2, etc.; the rightmost argument becomes null. For example, consider the (executable) file 
‘‘ripple’’ below. Echo writes its arguments to the standard output; (f exit, and goto are discussed later, 
but perform fairly obvious functions.? The form "$1" is used rather than ‘$1° because it is the value of 
the first argument that is desired, rather than the literal two-character string “‘$1”’: 


: loop 

if "$1" = "" exit 

echo $1 $2 $3 $4 $5 $6 $7 $8 39 
shift 

goto loop 


If the file were invoked by ‘“‘rippie a b c’’, it would print: 


abc 
be 
Cc 


The ‘“‘shift n’’ form of séift has no effect on the arguments to the left of the nth argument; the vth 
argument is discarded, and the higher-numbered ones shifted. Thus, “‘shift’’ is equivalent to “‘shift 1” 
(as is ‘‘shift 0°’). 


The notation $* causes substitution of a// current arguments except $0. Thus, the eco line in the “‘rip- 
ple’’ example above could be written in a better way as: 


echo $= 


These two echo commands are not equivalent: the first prints at most nine arguments; the second prints 
all its arguments. The $* notation is more concise and is less error-prone. One obvious application is 
in passing an arbitrary number of arguments to the nroff text formatter: 


nroff —nh —rTl —T450 —mm $= 


It is important to understand the sequence of actions used by the Shell in substituting arguments. 
First, the Shell reads one line of input, making all substitutions in a single pass; no rescanning is per- 
formed. Second, the Shell parses the resulting line. Third, the Shell executes all of the commands in 
that line. Thus, it is impossible for a command in a line to affect the argument values substituted into 
that same line. For example, the following sequence prints the same value twice, because the shift has 
no effect on the line in which it appears: 


echo $1; shift; echo $1 


5. Much better ways of coding this procedure are shown later. Lines that begin with ‘‘:’’ are labels and/or comments {4.5.1}. 


On the other hand, the next sequence prints the first argument, followed by the second: 


echo $1 
shift 
echo $1 


4.3 Shell Variables 


The Shell provides 26 string variables, $a through $z. Those in the range $a through $m are initialized 
to nuil strings at the beginning of execution and are never modified except by explicit user request. 
Some variables in the range $n through $z have specific initial values and may possibly be changed 
implicitly by the Shell during execution. A variable is assigned a value as follows: 


= fetter { argl [ arg2 } ] 


If arg/ is given, its value is assigned to the variable corresponding to /etter. If two arguments are given, 
and if arg/ is a null string, the value of arg2 is assigned to the variable, permitting a convenient default 
mechanism. If neither arg/ nor arg2 are given, a single line is read from the standard input, and the 
resulting string (with the new-line character, if any, removed) is assigned to the variable. 


The following are examples of simple assignments. You may omit quotes around the arguments if you 
are sure that they contain no special characters: 


= a "$1" 
a 5b “ssa02' 
= ¢ /usr/news/.mail 


The procedure below illustrates the use of a default argument. If an argument is given, mail is read 
from it. Otherwise, mail is read from *‘/usr/news/.mail”’: 


= a "$1" /usr/news/.mail 
mail —f $a 
The ‘‘=’’ command is often used to capture the output of a program. For example, date writes the 
current time and date to its standard output. The following line saves this value in $d: 
date | =d 
This works just as well in longer pipelines. The following saves in $a the number of logged-in users: 
who | we —i | = a 


Another use is in the writing of interactive Shell procedures. The following example is part of a pro- 
cedure to ask the user what kind of terminal is being used, so that tabs and delays can be set and other 
useful actions taken. The ‘‘</dev/tty’’ indicates a redirection of the standard input to the user termi- 
nal; it is mot seen as an argument to “‘=’’, but rather causes the variable to be set to the next line typed 
by the user: 


echo ‘terminal?’ 
= a </dev/tty 


Several variables are currently assigned special meanings: 


$n records the number of arguments passed to the Shell, not counting the name of the Shell pro- 
cedure itself. Thus, “‘sh file arg] arg2 arg3”’ sets $n to 3. Its primary use is in checking for the 
required number of arguments: 


if $n —it 2 then 
echo ‘two or more args required’; exit 
endif 


Shift never changes the value of $n. 


$p permits alteration of the ordered list of directory pathnames used when searching for commands. 
It contains a sequence of directory names (separated by colons) that are to be used as search 
prefixes, ordered from left to right. The current directory is indicated by a null string. 
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By default, Sp is initialized to a value producing the effect described in {3.7): ‘‘:/bin:/usr/bin’’. A 
user could possess a personal directory of commands (say, /al/tf/jtb/bin) and cause it to be 
searched before the other three directories by using: 


= p /al/tf/jtb/bin::/bin:/usr/bin 


$r_—s gives the value of the return code of the command most recently executed by the Shell. It is a 
string of digits; most commands return ‘‘0’’ to indicate successful completion. For example, the 
‘ax’? command returns ‘‘O’’ if two arguments are given and the first is not null, or if a line is 
actually read from the input. When the Sheil terminates, it returns the current value of $r as its 
own return code. 


$s __is initialized to the name of the user’s /ogin directory, i.e., the directory that becomes the current 
directory upon compietion of a login (e.g., ‘‘/al/tf/jtb’’). Using this variable helps one to keep 
full pathnames out of Shell procedures. This is of great benefit when pathnames are changed, 
either to balance disk loads or to reflect administrative changes. 


$t is initialized to the user’s terminal identification, a single letter or digit. The terminal can be 
manipulated using the file name ‘‘/dev/tty$St’’ or just ‘‘/dev/tty’’ alone. The latter is a generic 
name for the user’s terminal. 


$w is initialized.to the first component of $s, i.e., it is the name of the file system (such as ‘‘/al’’) in 
which the login directory is located. Like $s, it is used to avoid pathname dependencies, but is 
more useful than $s for projects involving many users. 


$z___is initialized to ‘‘/bin/sh’’. The command named by $z is the one that actually reads the Shell 
procedures invoked implicitly. The user can alter the choice of the Shell by overriding this value 
{4.4}. This facility is very useful when there are several different Shells in a system. This may 
occur because different groups of users want different Shells, or when a new Shell is being tested. 


In addition to the above variables, the following read-only variable is provided: 


$$ contains a 5-digit number that is the unique process number of the current Shell. Its most com- 
mon use is in generating unique names for temporary files. Unlike many other systems, 
PWB/UNIX provides no separate mechanism for the automatic creation and deletion of temporary 
files: a file exists until it is explicitly removed. Temporary files are generally undesirable objects: 
the PWB/UNIX pipe mechanism is far superior for many applications. However, the need for 
uniquely-named temporary files does occur, especially for multi-user database applications. The 
following example of $$ usage also illustrates the helpful practice of creating temporary files in a 
directory used only for that purpose: 


Is >$s/tmp/S$ 

. commands (some of which use $s/tmp/$3) 
: ‘clean up at end’ 
rm $s/tmp/$$ 


4.4 Initialization of Sp and $z by the .path File 


The user may request automatic initialization of each Shell’s Sp (and $z) by creating a file named 
‘* path’’ in the login directory. The first (or only) line should be of the form shown for $p (4.3}. If 
present, the second line should be the full pathname of a Shell. Every instance of the Shell looks for 
that ‘‘.path’’ file and initializes its own Sp (and $z) from it, if ‘“‘.path’’ exists. Otherwise, 
‘*:/bin:/usr/bin’’ and ‘‘/bin/sh”’ are the values used, respectively. Thus, the ‘‘.path’’ information is 
available to all of the user’s Shells, but changing $p or $z in one Shell does nor affect these variables in 
other Shells. In addition, ‘‘.path’’ is used in a consistent way by commands that must search for other 
commands, such as nohup, nice, and time.® This facility is heavily used in large projects, because it 
simplifies the sharing of procedures, and can be quickly altered to adapt to changes in organizational 
requirements. 


6. If you plan to write such a command, investigate the pexec subroutine, which combines the search and execution code. 
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4.5 Control Structures 


The Shell provides several commands that implement a variety of control structures. These commands 
are presented here in order of increasing complexity. See {6} for examples of these commands in the 
context of complete Shell procedures. 


wr Several of the control commands must not be “‘hidden” on command lines (e. g., behind semi-colons “‘;’’): 
else end endif endsw if switch while 
Other control commands may be “‘hidden’’: 
break breaksw continue = exit goto next 


4.5.1 Labels and Goto.. The command ‘“‘:’” is recognized by the Shell, but is then treated as a null 
operation. One use of “‘:”’ is to define a label to act as a target for goto. Another use is to begin a 
comment line. However, it is a good idea to place comments in quotes {3.5} if they contain any charac- 
ters that have a special meaning to the Shell, because the line is actually parsed, not just ignored. Using 
““goto label’’ causes the following actions: 


e A seek is performed to move the read pointer to the beginning of the command file. 

e The file is scanned from the beginning, searching for “*: label’’, either alone on a line, or followed 
by a blank or tab. 

e The read pointer is made to point at the line after the labeled line. 


Thus, the only effect of goto is the adjustment of the Shell’s file read pointer to cause the Shell to 
resume interpreting commands starting at the line following the labeled line. Invoking goto with an 
undefined labe! causes termination of the procedure {4.5.5}. 


mm” Avoid the ‘‘goto’’—future versions of the Sheil are not expected to allow it. 
4.5.2 If; Simple Conditional. 
if conditional-expression command [ args } 


Whenever the conditional-expression is found to be :rue, if executes the command (via the exec system 
call), passing the arguments to it. Whenever the conditional-expression is false, if merely exits. 


The following primaries can be used to construct the conditional-expression: 


—r file true if the named file exists and is readable by the user. 

—w file true if the named file exists and is writable by the user. 

—s file true if the named file exists and has a size greater than Zero. 

—d file true if the named file is a directory. 

—f file true if the named file is an ordinary file. 

sl = s2 true if strings s/ and s2 are identical. 

S] != §2 true if strings s/ and s2 are not identical. 

ni —eq n2 true if the integers n/ and n2 are algebraically equal. Other algebraic comparisons are 


9 
e 


indicated by ‘“‘—ne’’, ““—gt’’, ‘‘—ge’’, ““—It’’, and *‘—le 


{ command } the command is executed; a return code of 0 (yes, zero!) is considered true, any other 
value is considered fa/se. Most commands return 0 to indicate successful completion. 


These primaries may be combined with the following operators: 


unary negation operator. 


—a binary logical and operator. 
—0O binary logical or operator; it has lower precedence than ‘‘—a’’. 
( expr ) parentheses for grouping. They must be escaped to remove their significance to the 


Shell. In the absence of parentheses, evaluation proceeds from left to right. 
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All of the operators, flags, and values are separate arguments to if and must be separated by blanks. 
You must be careful to make sure that an argument actually appears and can be parsed correctly: 


if "$1" = "" echo missing argument 
if 0$1 = 0 echo missing argument 
if 0°$1" = 0 echo missing argument 


The first example guards against the possibility that $1 is omitted, null, or has embedded blanks; the 
second guards against the possibility that $1 has a value that causes parsing problems (such as ‘‘—r’’), 
or that it is omitted or null; the third guards against all these problems. The following is dangerous: 


if $1 = "" echo missing argument 


because it would cause a syntax error in any of the above cases. Substitution of variables and argu- 
ments occurs effectively before parsing; thus, for example, if $1 were null, then after substitution the 
line would read: 


if = "" echo missing argument 


In this case, $1 without quotes yields no argument at all (on the other hand, "$1" would have yielded 
an argument whose value is the null string). It is generally desirable to quote arguments (with double 
quotes—see 3.5}), especially when they might possibly contain blanks or other characters that have a 
special meaning to the Shell. Examples of the use of jfcan be found in (6}. 


4.5.3 If—then—else—endif? Structured Conditional. A more general (and much more readable) form of 
if can be used: 


if conditional-expression then 
... commands 

else 
-.- commands 

endif 


The e/se and the commands following it may be omitted. It is legal to nest jf commands, but there 
must be an endif to match every then. 


When jf is called with a command, using the form of {4.5.2}, it acts as described there, deciding 
whether or not to execute the supplied command. When called with then instead of another command, 
if simply exits on a true, allowing the Shell to read and interpret the immediately following lines. On a 
false, if reads the file until it finds the next unmatched e/se or endif, thus skipping it and any other inter- 
vening lines. £lse reads to the next unmatched endif. Endif is a null command. 


These commands work together in a way that produces the appearance of a familiar control structure, 
although they do little but adjust the Shell’s read pointer. Be warned that this implementation tech- 
nique does nor do a good job of diagnosing extra, missing, or hidden if else, or endif commands {4.5}; if 
you suspect that there are such extra or missing commands, ‘‘opt —v’’ often heips {3.8,4.1}. 


4.5.4 Switch—breaksw—endsw: Multi-way Branch. The switch command manipulates the input file in a 
way quite similar to jf It is modeled on the ‘‘switch’’ statement of the C language [8], and like it, pro- 
vides an efficient multi-way branch: 


switch value 
: labell 
. commands 
: label? 
.«. COmmands 


: default 
. commands 
endsw 
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Switch reads the input until it finds: 


e a statement label that matches vaiue. The label may contain special characters as described in {3.4}: 
the method of matching is identical. A few of the many possible labels that could be used to match 
the value *‘thing.c’” are: 


thing.c *.c t*# « 2999999 


e default used as a statement label (optional). 
e the next unmatched endsw command. 


Again, from the Shell’s viewpoint, the only effect of switch is to adjust the read pointer so that the Shell 
effectively skips over part of the procedure, and then continues executing commands following the 
chosen label or endsw. For examples, see ‘‘.profile’’ and ‘‘fsplit’’ in {6}. 


Value is obtained from an argument or from a variable; if the label default is present, it must be the last 
label in the list; it indicates a default action to be taken if va/ue matches none of the preceding labels. 
The switch construct may be nested; labels enclosed by interior switch-endsw pairs are ignored during the 
execution of switch. Breaksw reads the input until the next unmatched endsw and is used to end the 
sequence of commands associated with a label. Endsw is a null command like endif. 


4.5.5 End-of-file and Exit. When the Shell reaches the end-of-file, it terminates execution, returning 
to its parent the return code found in $r. The exit command simply seeks to the end-of-file and 
returns, setting the return code to the value of its argument, if any. Thus, a procedure can be ter- 
minated ‘‘normally”’ by using exit 0. 


4.5.6 While—break—continue—end: Looping. A while-end pair delimits a loop. Break can be used to 
terminate execution of such a loop. Continue requests the execution of the next iteration of the loop: 


while conditional-expression 
... Commands 
end 


While evaluates the conditional-expression, which is similar to that of if {4.5.2}. If the conditionai- 
expression is true, while does nothing, permitting the following lines to be read and interpreted. If the 
conditional-expression is false, the input file is searched for a matching end, and command interpreta- 
tion resumes with the next line. While-end groupings may be nested to a depth of three. 


While treats a single, non-null argument as true and a single null argument or lack of arguments as false. 
This is convenient for the simple case that handles one argument per iteration: 


while "$1" 
Do something with $1. 
shift 

end 


Break terminates execution of the smallest enclosing while-end group, causing execution to resume after 
the nearest following unmatched end. Exit from n levels is obtained by writing n break commands on 
the same line: 


break; break; ... 


Continue causes execution to resume at the preceding while, i.e., the one that begins the smallest loop 
containing the continue. 


4.5.7 Conditional Operators || and &&. These operators enforce left-to-right execution of commands. 
In the line ‘‘cmdl |] cmd2’’, cmd] is executed and its return code examined. Only if it failed (exit 
code non-zero) is cmd2 executed. It is thus a more terse notation for: 


emd1 

if $r —ne 0 then 
cmd2 

endif 
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The ‘‘&&’’ operator yields the inverse test: in ‘‘cmdl && cmd2’’, the second command is executed 
only if the first succeeds (exit code zero). In the sequence below, each command is executed in order, 
until one fails: 


cmd! && cmd2 && cmd3 && ... && cmdn 
See ‘‘fsplit’’ and “‘writemail’’ in (6} for examples. 


4.5.8 Next: Transfer to Another File. The command ‘‘next mame’’ causes the Shell to abandon the 
current input and begin reading file name. Next with no arguments causes the Shell to read from the 
terminal. By creating a file that initializes Shell variables, then typing ‘‘next file’’ at the terminal, any- 
one can have a simple shorthand for setting a number of Shell variables with little typing. See ‘“‘nx”’ in 


{6}. 


4.6 Onintr: Interrupt Handling 


As noted in {2.2}, a program may choose to ‘‘catch’’ an interrupt from the terminal, ignore it com- 
pletely, or be terminated by it. Shell procedures can use oninir to obtain the same effects: 


onintr { label ] 


Onintr takes several forms: ‘‘onintr label’’ yields the effect of ‘‘goto label’’ on receipt of an interrupt; 
‘‘onintr’’ alone causes normal action to be restored, so that the process terminates on the next inter- 
rupt; ‘‘onintr —’’ causes interrupts to be ignored completely, not only by the Sheil, but also by any 
commands invoked by it. 


The most frequent use of onintr is to make sure that temporary files are removed at the end of a pro- 
cedure. The example at the end of {4.3} typically would be written as: 


onintr clean 
ls >3$s/tmp/S$3$ 
..». cOMmmands 
> clean 
rm $s/tmp/$$ 


When ‘‘onintr label’’ is used, interrupts are effective at the time when the label is reached; it is often 
desirable to insert another oninir following the label. Even so, there may be a short “‘window”’ when 
the user can accidentally kill the procedure by causing repeated interrupts in quick succession. 


4.7 Special I1/O Redirections 


As noted in (3.2}, when the Sheil is invoked it expects to inherit from its parent an open standard input 
(file descriptor 0), standard output (file descriptor 1), and diagnostic output (file descriptor 2). Each of 
these is initially connected to the terminal. - 


4.7.1 Standard Input. When the Sheil is invoked to read a command file, it saves the old standard 
input (in an invisible place), then opens the command file as the new standard input. The fact that 
commands inherit the new standard input is convenient for commands that read in-line data (editor 
scripts, etc.) not read by the Shell. However, this mechanism prevents a Shell procedure from acting as 
a filter or from reading the o/d standard input in the way that most C programs do. The Sheil solves 
this problem by permitting the notation ‘‘<——’’ to allow a command to take its input from the old 
standard input, which the Sheil has previously saved.’ 


Note that ‘*</dev/tty’? and ‘“<——’’ usually have equivalent effects in a procedure invoked directly 
from the terminal. The effects differ in a procedure invoked from within another procedure, unless the 
first procedure takes care to invoke the second with ‘‘<——’’. In any case, ‘““<——”’ Is to be preferred 
a it can be used to read from a file or a pipe and is thus more general. See “‘fsplit’’ and ‘‘lower’’ 
in {6}. 


7. The notation ‘‘-—" arises from the concept of “‘standard input once removed’’, because many pwe/UNIX commands accept 
‘*.’" in place of a file name to indicate that the current standard input should be read. This choice makes it impossible to 
redirect input from a file named ““~-—’’. Fortunately, file names almost never begin with ‘‘—’’, because many commands 
expect ‘*‘—’* to signal a flag of some sort. 
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4.7.2 Standard Output. The use of ‘‘>/dev/tty’’ redirects output to the terminal, even if used in the 
middle of a pipeline. Shell procedures that act as filters sometimes need to do this. The redirection 
‘*>/dev/null’’ causes the standard output of a command to be thrown into a bottomless pit (presum- 
ably to feed the wumpus—see wump(VI)). This is used when you want to execute a command for its 
side-effects, but do not want to be bothered by its output. 


4.7.3 Diagnostic Output. Most commands direct diagnostics to file descriptor 2 to make sure that they 
do not get lost down pipelines. Some situations require that this output go to some place other than 
the terminal. For example, a long-running procedure may be started, and then the terminal is hung up. 
In this case, it is helpful to save diagnostics in a file. A deficiency of the current Shell is the lack of 
syntax for redirecting the diagnostic output. The separate command /d@2 performs the required services: 


fd2 [ +] [ —file ] [ ——file ] command arguments ... 


The ‘‘+’’ flag causes diagnostic output to be merged into the standard output. The second option 
writes that output to file; the third appends it to file. If the file name is omitted in the second or third 
cases, “‘msg.out’’ is used. If no flag is given, ‘‘—msg.out’”’ is assumed. 


4.8 Quoting Revisited 


The main problem with quoting conventions is the need to treat “‘$’’ and ‘‘\”’ in ways flexible enough 
for convenient use with arguments and variables, but simple enough to be understandable, easy to 
implement, and unobtrusive in simple cases. In this respect, the current version of the Shell is far 
from elegant, but is reasonable in practice. The rules are: 


e Inside single quotes, every character stands for itself without exception. A single quote is oz, itself, 
allowed within single quotes. 

e Inside double quotes, ‘‘\S”’ and ‘‘\"”’ stand for the characters ‘‘$’’ and ‘‘"’’, respectively, but with ail 
special meaning removed. All other characters, other than a pair of characters the first of which is 
an unescaped ‘‘$”’, behave exactly as they do within single quotes, including a ‘‘\”’ not followed by a 
“Ss or a dda 

e Inside double quotes and outside either kind of quotes, any two-character sequence whose first charac- 
ter is an unescaped ‘‘$’’ is replaced by the value of the corresponding Shell argument or variable; 
any variable that has no value (such as ‘‘$:’’) is replaced by a null string. 

e Outside either kind of quotes, any two-character sequence whose first character is a ‘‘\”’ is replaced 
by the second character of that sequence, but with any special meaning removed. 


4.9 Creation and Organization of Shell Procedures 


A Shell procedure can be created in two simple steps. The first is that of building an ordinary text file. 
The second is that of changing the mode of the file to make it executable, thus permitting it to be 
invoked by ‘‘name args’’, rather than “‘sh name args’. The second step may be omitted for a pro- 
cedure to be used once or twice and then discarded, but is recommended for longer-lived ones. 


Here is the entire input needed to set up a simple procedure (the executable part of ‘‘draft’’ in {6}): 
ed 


a 
nroff —rC3 —T450—12 —mm $=- 
w draft 

q 

chmod 755 draft 


It may then be invoked as ‘“‘draft filel file2’’. If the Shell procedure ‘“‘draft’’ were thus created in a 
directory whose name appears in the user’s “‘.path’’ file, the user could change working directories and 
still invoke the “‘draft’’ command. 


Shell procedures may be created dynamically. A procedure may generate a file of commands, invoke 
another instance of the Shell to execute that file, then remove it. An alternate approach is that of using 
next to make the current Shell execute the new file, allowing use of existing Shell variables and avoid- 
ing the spawning of an additional process for another Shell. In some cases, the need for a temporary 
file may be eliminated by using the Shell in a pipeline. 
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Many users prefer to write Shell procedures instead of C programs. First, it is easy to create and main- 
tain a Shell procedure because it is only an ordinary file of text. Second, it has no corresponding object 
program that must be generated and maintained. Third, it is easy to create a procedure ‘“‘on the fly’’, 
use it a few times, then remove it. Finally, because Shell procedures are usually short in length, writ- 
ten in a high-level programming language, and kept only in their source-language form, they are gen- 
erally easy to find, understdnd, and modify. 


By convention, directories of commands and/or Shell procedures are usually named ‘“‘bin’’. Most 
groups of users sharing common interests have one or more “‘bin’’ directories set up to hold common 
procedures. Some users haye *‘.path’’ files that list several such directories. Although you can have a 
number of such directories, it is unwise to go overboard—it may become difficult to keep track of your 
environment, and efficiency may suffer {7.3}. 


5. MISCELLANEOUS SUPPORTING COMMANDS 


Shell procedures can make use of almost any command. The commands described in this section are 
either used especially frequently in Shell procedures, or are explicitly designed for such use. 


5.1 Echo: Simple Output 


The echo command, invoked as ‘‘echo [ args ]”’, copies its arguments to the standard output, each fol- - 
lowed by a single space, except the last argument, which is followed by a new-line; often, it is used to 
prompt the user for input, to issue diagnostics in Sheil procedures, or to add a few lines to an output 
stream in the middle of a pipeline. Another use is to verify the argument list generation process (as in 
{3.4}) before issuing a command that does something drastic. The command “‘Is” is often replaced by 
‘‘echo *’’ because the latter is faster and prints fewer lines of output. 


Echo recognizes several escape sequences. A ‘‘\n’’ yields a new-line character. Echo normaily appends 
a new-line character to its last argument; a ‘‘\c’’ is used to suppress that new-line character. The follow- 
ing prompts the user for input and allows input to be typed on the same line as the prompt: 


echo ‘enter name: \c’ 
= a </dev/tty 


Echo also recognizes an octal escape sequence for any character, whether printable or not. 
5.2 Pump: Shell Data Transfer 


Pump is a filter that copies its standard input to its standard output with possible substitution of Shell 
arguments and variables: 


pump [ —{ subchar ] ] [ + ] [ eofstr ] 


Pump reads input until an end-of-file, or until it finds eofstr alone on a line. The default eo/sir is 
Normally, Shell arguments and variables are substituted in the data stream. The flag ‘‘—’’ suppresses 
all substitution, while the form ‘‘—subchar’’ causes subchar to be used as the indicator character for 
substitution of Shell variables and arguments, instead of ‘‘$’’. Escaping is handled as in strings 
enclosed by double quotes—the indicator character may be hidden by preceding it with ‘‘\’’. The ‘‘+” 
flag causes all leading tab characters in the input to pump to be eliminated; this permits that input to be 
indented for readability. A common use of pump is to get Shell variables into editor scripts—see 
‘‘edfind’’ in {6}, for example. Because editor scripts may use ‘‘$’’ for other purposes, readability may 
be improved by using a subchar such as ‘“%’’: 


66999 
e e 
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: ‘in file $1, change every instance of $2 to $3’ 
‘ ‘then delete all lines consisting only of $4’ 
if —-r "$1" then 

pump —% + | ed $1 


-g/%2/s//%3/2 

2/ ~%4$/d 

WwW 

! 
else 

echo "$1: cannot open" 
endif 


Pump is often- used to copy a few lines to another file: 


pump > >logfile 
here is $1 


and here is $2 on a separate line 


5.3 Expr: Expression Evaluation 


Expr supports arithmetic and logical operations on integers, and PL/I-like ‘“‘substr’’, “‘length’’, and 
**index’’ operators for string manipulation. It evaluates a single expression and writes the result to the 
standard output, typically piped into “‘=’’ to be assigned to a variable. Typical examples are: 


: ‘increment $a 
expr $a + 1|[ =a 


: ‘put 3rd through last characters of $1 into $b’ 
‘ ‘expr substr abcde 3 1000 returns cde (1000 is just a big number)’ 
expr substr "$1" 3 1000 | = b 


: ‘obtain length of $1’ 
expr length "$1" {| = c 


The most common uses of expr are in counting for loops and in using “‘substr’’ to pick apart strings. 
5.4 Legname, Logdir, Logtty: Login Data 


When a user logs in, he or she supplies a login name and a password. The /ogin program searches the 
password file for that /ogin name and obtains the name of the program to be executed by the user (nor- 
maily the Shell), the directory to be made the current directory, and also a userid, a value ranging from 
0 to 255. Most UNIX protection and identification mechanisms utilize the last item. Limiting the 
number of distinct users to 256 is no problem for most UNIX systems, but the original PWB/UNIX instal- 
lation currently supports more than 1,000 users. However, it is not necessary to provide a distinct 
userid for every user. Project-oriented groups of users often choose to share one or two userids, in 
order to ease the problems caused by personnel absences, and also to ease the manipulation of shared 
files.2 Although the members of such groups do not generally worry about being protected from each 
other, they need to be identified as distinct individuals by some programs, i.e., those that tag inter-user 
messages with user names or log the name of the user making a change to a source program. 
PWB/UNIX records the login name instead of discarding it after login. The logname command writes this 
name to the standard output, allowing it to be captured in a Shell variable. It can then be used to per- 
mit only selected users to execute a procedure, or can be included in logging information: 


logname | = u 
(echo “Su updated files on \c"; date) > >projectiog 


The /ogdir and /ogtty commands are used in the same way as /ogname; they produce the same values as 
the initial values of $s and $t, respectively {4.3}. 


8. Although some groups started by using one userid per person, it was discovered that these users often shared a single 
password. Thus, the possession of separate userids was considered more of a hindrance than a heip. 
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6. EXAMPLES OF SHELL PROCEDURES 


wm” Some examples in this section may be quite difficult for beginners. For ease of reference, the examples 
are arranged alphabetically by- name. 


.profile: 


"profile (automatically invoked on login) asks for terminal type,’ 
: ‘reads a line from terminal, loops until a known type’ 
: ‘(or empty line) is entered, sets terminal options appropriately,’ 
‘asks for new directory name and changes‘to it, if one is given,’ 
: ‘and then, if file nx exists, transfers to it 
while 1 
echo ‘terminal:\c’ 
=a </dev/tty 
switch "$a" 
: “DASI450° 
: 450 
stty cr2; tabs +t450; break 
: ‘GSI/DASI300° 


Stty cr2; tabs; break 
“HP264X" 


stty crQ nlQ; tabs +thp; break 
: “TI 700° 
> ti 
stty —tabs nil cri; break 
: default 
if O"$a" = 0 break 
echo "$a? try 450,gsi,hp,ti" 
endsw 
end 
echo "cd \c" 
= b </dev/tty 
if "$b" != "" then 
cd $b 
endif 
if —r nx then 
next nx 
endif 


Note: Break is used instead of dreaksw in the above example to terminate the while loop, not just the 
switch construct. 


copypairs: 


‘copypairs filel file2 ...° 
; ‘copy filel to file2, file3 to filed, ...’ 
while "$2" 
cp $1 $2 
shift; shift 
end 
if 0°31" '= 0 echo ‘odd number of arguments’ 


Note: Remember that “‘shift; shift’’ is nor the same as “‘shift 2°’. See next example for use of ‘‘shift 
Zz. 


copyto: 


distinct1: 


Note: 
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: ‘copyto dir file ...° 
‘copy argument files to dir, making sure that at least’ 
‘two arguments exist, that dir is a directory, and that’ 
: ‘each additional argument is a readable file’ 
if $n —It 2 then 
echo ‘usage: copyto directory file ...°; exit 


endif 
if ! —d $1 then 
echo "$1 is not a directory"; exit 
endif 
while "$2" 
if |! —r $2 then 
echo "$2 not readable" 
else 
cp $2 $1 
endif 
shift 2 
end 
“distinct 1’ 


‘reads standard input, reports list of identifiers that 
; ‘differ only in case, giving lower case form of each’ 
tr —cs “[A—Z]{a—z] [0-9] “[\012*]’ <—— | sort —u | tr “[A—Z]° ‘{a—z]’ | sort | uniq —d 


This procedure is an example of the kind of process that is created by the ‘“‘left-to-right’’ con- 
struction of a long pipeline. It may not be immediately obvious how this works. The rr 
translates all characters except letters and digits into new-line characters, and then ““squeezes 
out’’ repeated new-line characters. This leaves each identifier (in this case, any contiguous 
sequence of letters and digits) on a separate line. Sort sorts the lines and emits only one line 
from any sequence of one or more repeated lines. The next # converts everything to lower 
case, so that identifiers differing only in case become identical. The output is sorted again to 
bring such duplicates together. The unig —d prints once only those lines that occur more than 
once, yielding the desired list. 


The process of building such a pipeline uses the fact that pipes and files can usually be inter- 
changed; the two lines below are equivalent, assuming that sufficient disk space is available: 


emd1l | cemd2 | cmd3 
cmdl >tmpl; <tmpl cmd2 >tmp2; <tmp2 cmd3; rm tmp{1—3] 


Starting with a file of test data and working from left to right, each command is run taking its 
input from the previous file and putting its output in the next file. The final output file is then 
examined to make sure that it contains the expected result. The goal is to create a series of 
transformations that will convert the input to the desired output. As an exercise, try to mimic 
‘*distinctl’’ with such a step-by-step process, using a file of test data containing: 


ABC: DEF/DEF 
ABC! ABC 
Abc abc 


Although pipelines can give a concise notation for complex processes, exercise some restraint 
lest you succumb to the ‘‘one-line syndrome’? sometimes found among users of especially con- 
cise languages. This syndrome often yields incomprehensible code. 


distinct2: 


Note: 


draft: 


Note: 


edfind: 


Note: 


ediast: 


Note: 


= 90 


‘distinct2’ 
‘reads standard input, reports sorted list of identifiers that differ’ 
‘in case only, listing all such distinct identifiers’ 
onintr cleanup 
tr —cs ‘[A—Z]{a—z] [0—9]’ “{\012*]" <—— | sort —u | tee t1$$ | tr “[A—Z]’ ‘[a—z]’ >t2$$ 
pr —s —t —[1 —m t1$$ t2$$ | sort +1 >t353$ 
‘third argument to pr in above line is “minus ell one™ 
sort t3$$ >t4$$ 
uniq —u —1 t3$$ [.sort | comm —23 t4$$ — | sort +1 
: cleanup 
rm t2$$ 


This procedure is similar to the previous one, but provides more explicit information. As an 
exercise, work through this procedure in the way described above. The commands used here 
(plus grep and sed) form the basis for many ‘‘data stream”’ operations. 


“draft file ... 

‘prints the draft (—rC3) of a document on a DASI450 terminal in 12—pitch’ 
; ‘using PWB/MM’ 
nroff —rC3 ~—T450~-12 —mm $« 


Users often write this kind of procedure for convenience in dealing with commands that require 
the use of many distinct flags that cannot be given default values that are reasonable for all (or 
even most) users. 


‘edfind file arg’ 
‘find the last occurrence in file of a line that matches arg,’ 
; ‘then print 3 lines (the one before, the line itself, and the one after)’ 
pump | ed — $1 
2$2?;~—,+p 
1 


This illustrates the typical practice of using pump to substitute Shell variables into ed scripts. 


‘edlast file’ 

: ‘prints the last line of file, then deletes that line’ 
ed — $1 
Sp 

$d 

Ww 


q 
echo done 


This procedure illustrates the effects of a command that reads input from a file shared with the 
Sheil. 
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fsplit: 


“fsplit filel file2’ 
‘read standard input and split it into three parts:’ 
‘append any line containing at least one letter to filel, any line’ 
: ‘containing digits but no letters to file2, and throw the rest away’ 
= i 0; =j 0 
while 1 
m= a <~—— || break 
expr Si + 1 | =i 
switch “$a” 
: »{A—Za—z]» 
echo "$a" >>$1; breaksw 
: #(0—9]> 
echo "$a" >>$2; breaksw 
: default 
expr $j + 1 | =j 
endsw 
end 
echo "$i lines read, $j thrown away” 


Note: Each iteration of the loop reads a line from the input and analyzes it. The break terminates the 
loop only when ‘‘=’” encounters an end-of-file. 


my” Don't use the Shell to read a line at a time unless you must—it can be grotesquely siow {7.2.1}. 
loop: 


‘loop arg ... 
: ‘one or more command lines’ 
‘endloop 
‘execute the group of command lines once for each argument,’ 
“substituting each argument as $1 in the command lines’ 
onintr cleanup 
echo ‘while "$1" >tmp3$ 
pump — + endloop <-—— >>tmp$$ 
echo ‘shift \n end’ >>tmps$ 
next tmp$$; rm tmp$$ 
: cleanup 
rm tmp$$ 


Note: Such a procedure is typically used from a terminal to repeat some commands for a list of argu- | 
ments. It creates a temporary file that sandwiches user input between a while and shift-end. It 
then transfers to that file. For example, all files in the current directory could be copied to 
““place’” by: 


loop + 

cp $1 place 
echo $1 copied 
endloop 


lower: 


‘lower 
‘Teads standard input, converts it to lower case, writes to standard output’ 
‘can thus be used in a pipeline if desired’ 

tr ‘[A—Z]’ ‘[a-z]’ <-- 


Note: This is the most common type of use for ‘“<——”’. 


i iy ee 


mikfiles:. 
‘mkfiles prefix (number]’ 
: ‘makes number (default = 5) files, named prefixl, prefix2, ... 
= a "$2" 5 
=z j | 
while $i —le $a 
cp /dev/nuil $15i 
expr $i + 1{ = i 
end 
null: 
“null file ...’ 
; ‘create each of the named files as an empty file 
while "$1" 
cp /dev/null $1 
shift 
end 
nx: 
‘next nx’ 
‘asks for module name, initializes variables to useful values,’ 
‘prints variables. Note that variables are set within the invoking Shell,’ 
’so nx can be invoked only from terminal or from .profile’ 
= a /sys/source/s1 
= b /usr/man/man| 
echo “m: \c" 
= m </dev/tty 
== g “get -e s.$m; ed $m" 
= d “delta s.$m" 
pump 
az: $a ss sib: SD 
d: $d g: Sg m: $m 
! 
next 
phone: 
‘phone initials’ 
‘prints the phone number(s) of person with given initials’ 
echo ‘inits ext home’ 
grep "*$1" 
abc 1234 999-2345 
def 2234 583—2245 
ghi 3342 988-1010 
XYZ 4567 555-1234 
writemail: 


“writemail message user’ 

‘if that user is logged in, write message on terminal; 
: ‘otherwise, mail it to that user’ 
echo "$1" {| ( write "$2" || mail "$2" ) 


Note: Replacing ‘“‘echo’’ above by “pump . <——’’ writes or mails the standard input, in the same 
way as the mail command. 
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7, EFFECTIVE AND EFFICIENT SHELL PROGRAMMING 
7.1 Overall Approach 


This section outlines strategies for writing ‘‘efficient’? Shell procedures, i.e., ones that do not waste 
resources unreasonably in accomplishing their purposes. In the author’s opinion, the primary reason 
for choosing the Shell procedure as the implementation method is to achieve a desired result at a 
Minimum jAuman cost. Emphasis should a/ways be placed on simplicity, clarity, and readability, but 
efficiency can also be gained through awareness of a few design strategies. In many cases, an effective 
redesign of an existing procedure improves its efficiency by reducing its size, and often increases its 
comprehensibility. In any case, one should not worry about optimizing procedures unless they are 
intolerably slow or are known to consume a lot of resources. 


The same kind of iteration cycle should be applied to Sheil procedures as to other programs: write code, 
measure it, and optimize only the few important parts. The user should become familiar with the time 
command, which can be used to measure both entire procedures and parts thereof. Its use is strongly 
recommended; human intuition is notoriously unreliable when used to estimate timings of programs, 
even when the style of programming is a familiar one. Each timing test should be run several times, 
because the results are easily disturbed by, for instance, variations in system load. 


7.2 Approximate Measures of Resource Consumption 


7.2.1 Number of Processes Generated. When large numbers of short commands are executed, the 
actual execution time of the commands may well be dominated by the overhead of spawning processes. 
The cpu overhead per process lies in the range of 0.07 to 0.1 seconds, depending on the specific 
hardware configuration. The procedures that incur significant amounts of such overhead are those that 
perform much looping, and those that generate command sequences to be interpreted by another Sheil. 


If you are worried about efficiency, it is important to know which commands are currently built into the 
Shell, and which are not. Here is the alphabetical list of those that are built-in: 


chdir endsw newerp shift 


= continue exit next switch 
break else goto onintr test 
breaksw end if opt wait 
cd endif login pump while 


Pump actually executes as a child process, i.e., the Shell does a fork, but no exec; ‘‘()”’ executes in the 
Same way. Any command not in the above list requires both fork and exec. 


The user should always have at least a vague idea of the number of processes generated. In the bulk of 
observed procedures, the number of processes spawned (not necessarily simultaneously) can be 
described by: 


processes = ken+c 


where k and c are constants, and » is the number of procedure arguments, the number of lines in some 
input file, the number of entries in some directory, or some other obvious quantity. Efficiency 
improvements are most commonly gained by reducing the value of k, sometimes to zero. Any pro- 
cedures whose complexity measures include n* terms or higher powers of n are likely to be intolerably 
expensive. 


AS an example, here is an analysis of procedure ‘‘fsplit’’ of {6}. For each iteration of the loop, there is 
one expr plus either an echo or another expr. One additional echo is executed at the end. If 7 is the 
number of lines of input, the number of processes is 2*n+1. On the other hand, the number of 
processes in the following (equivalent) procedure is 12, regardiess of the number of lines of input: 


£94 


fsplit2: 


onintr cleanup 
= b ‘[ABCDEFGHIJKLMNOPQRSTUV WXY Zabcdef ghijkimnoparstuvwxyz]’ 
cat <—— >tmp$$ 
grep "$b" tmp$3 >tmp$3$1 
grep —v "$b" tmp$$ | grep "[0123456789]" >tmp$$2 
cat tmp$$1 >>$1 ; cat tmp$$2 >>$2 
we —| <tmp$$ | = i 
we —| <tmp$$1 | = j 
we —!| <tmp$$2 [| = k 
expr $i - $j — $k | =a 
echo “$i read, $a thrown away" 
cleanup 
rm tmp$3$« 


This version is often ten times faster than ‘‘fsplit’’, and it is even better for larger input files. 


Some types of procedures should not be written using the Shell. For example, if one or more processes 
are generated for each character in some file, it is a good indication that the procedure should be rewrit- 
ten in C. 


mr Shell procedures should not be used to scan or build files a character at a time. 


7.2.2 Number of Bytes of Data Accessed. It is worthwhile considering any action that reduces the 
number of bytes read or written. This may be important for those procedures whose time is spent pass- 
ing data around among a few processes, rather than creating large numbers of short processes. Some 
filters shrink their output, others usually increase it. It always pays to put the ‘“‘shrinkers’’ first when 
the order is irrelevant. Which of the following is likely to be faster? 


sort file | grep pattern 
grep pattern file | sort 


7.2.3 Directory Searches. Directory searching can consume a great deal of time, especially in those 
applications that utilize deep directory structures and long pathnames. Judicious use of cd can help 
shorten long pathnames and thus reduce the number of directory searches needed. As an exercise, try 
the following commands (on a fairly quiet system) .? 


time sh ~—c ‘Is. —| /usr/bin/* >/dev/null 
time sh —c ‘cd /usr/bin; Is —1 « >/dev/null’ 


7.3 Efficient Organization 


7.3.1 Directory Search Order and the .path File. The “‘.path’’ file is a popuiar and convenient mechan- 
ism for organizing and sharing procedures. However, it must be used in a sensible fashion, or the 
result may be a great increase in system overhead that occurs in a subtle, but avoidable way. 


The process of finding a command involves reading every directory included in every pathname that 
precedes the needed pathname in the current $p variable. As an example, consider the effect of invok- 
ing nroff (/usr/bin/nroff) when $p is ‘‘:/bin:/usr/bin’’. The sequence of directories read is: ‘‘.”’, ‘‘/’’, 
*“/bin’’, “*/°’, “S/usr’’, and ‘‘/usr/bin’’, i.e., a total of six directories. A long ‘‘.path’’ can increase this 
number significantly. 


The vast majority of command executions are of commands found in ‘‘/bin”’ and, to a lesser extent, in 
**/ust/bin’’, Careiess “‘.path’’ setup may lead to a great deal of unnecessary searching. The following 
four examples are ordered from worst to best (at least with regard to efficiency): 


9. You may have to do some reading in the Pwarunix User’s Manual [3] to understand exactly what is going on in these 
examples. 
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:/al/tf/jtb/bin:/al/tf/bin:/bin:/ usr/bin 
:/din:/al/tf/jtb/bin:/al/tf/bin:/usr/bin 
:/bin:/usr/bin:/al/tf/jtb/bin:/al/tf/bin 
/din::/usr/bin:/al/tf/jtb/bin:/al/tf/bin 


The first one above should be avoided. The others are acceptable—choice among them is dictated by 
the rate of change in the set of commands Kept in ‘‘/bin’”’ and ‘‘/usr/bin’’. 


A procedure that is expensive because it invokes many short-lived commands may often be speeded up 
by changing $p to resembie the last of the above four examples. 


7.3.2 Good Ways to Set up Directories. It is wise to avoid directories that are larger than necessary. 
You should be aware of several ‘‘magic sizes’’. A directory that contains entries for up to 30 files (plus 
the required ‘‘.”’ and ‘‘,.’’) fits in a single disk block and can be searched very efficiently. One that has 
up to 254 entries is stifl a ‘‘smaill’’ file; anything larger is usually a disaster when used as a working 
directory. It is especially important to keep login directories small, preferably one block at most. 
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ABSTRACT 


This paper is meant to help new users get started on UNIX. It covers: 


@ basics needed for day-to-day use of the system — typing commands, correct- 
ing typing mistakes, logging in and out, mail, inter-console communication, the 
file system, printing files, redirecting 1/O, pipes, and the shell. 


@ document preparation — a brief tutorial on the ROFF formatter for beginners, 
hints on preparing documents, and capsule descriptions of some supporting 
software. 


@® UNIX programming — using the editor, programming the shell, programming 
in C, other languages. 


There is also an annotated UNIX bibliography. 


UNIX for Beginners 


Brian W. Kernighan 
Bell Laboratories, Murray Hill, N. J. 


In many ways, UNIX is the state of the art 
in computer operating systems. From the 
user’s point of view, it is easy to learn and use, 
and presents few of the usual impediments to 
getting the job done. 

It is hard, however, for the beginner to 
know where (o start, and how to make the best 
use of the facilities available. The purpose of 
this introduction is to point out high spots for 
new users, so they can get used to the main 
ideas of UNIX and start making good use of it 
quickly. 

This paper is not an attempt to re-write 
the UNIX Programmer's Manual; often the discus- 
sion of something is simply “read section x in 
the manual.” (This implies that you wiil need a 
copy: of the UNIX Programmer's Manual.) Rather 
it suggests in what order to read the manual, 
and it collects together things that are stated 
only indirectly in the manual. 


There are five sections: 


|. Getting Started: How to log in to a UNIX, 
’ how to type, what to do about mistakes in 
typing, how to log out. Some of this is 
dependent on which UNIX you log into 
(phone numbers, for example) and what 
terminal you use, so this section must 
necessarily be supplemented dy local in- 
formation. 


2. Day-to-day Use: Things you need every 
day to use UNIX effectively: generally use- 
ful commands; the file system. 


3. Document Preparation: Preparing 
manuscripts is one of the most common 
uses for UNIX. This section contains ad- 
vice, but not extensive instructions on 
any of the formatting programs. 


4. Writing Programs: UNIX is an excellent 
vehicle for developing programs. This 
section talks about some of the tools, but 
again is not a tutorial in any of the pro- 
gramming languages that UNIX provides. 

5. A UNIX Reading List.. An annotated bi- 


bliography of documents worth reading by 
new users. 


I. GETTING STARTED 


Logging In 

Most of the details about logging in are in 
the manual section cailed “How to Get Started” 
(pages iv-vin the Sth Edition). Here are a cou- 
ple of extra warnings. 


You must have a UNIX login name, which 
you can get from whoever administers your 
system. You also need to know the phone 
number. UNIX is capable of dealing with a 
variety of terminalis: Terminet 300’s; Execu- 
port, TI and similar portables; video terminals; 
GSI’s; and even the venerable Teletype in its 
yarious forms. But note: UNIX will not handle 
IBM 2741 terminals and their derivatives (e.g., 
some Anderson-Jacobsons, Novar). Further- 
more, UNIX ts strongly oriented towards devices 
with /ower case. If your terminal produces only 
upper case (e.g, mode! 33 Teletype), life will be 
$0 difficult that you should look for another ter- 
minal. 


Be sure to set the switches appropriately 
on your device: speed (if it’s variable) to 30 
characters per second, lower case, full duplex, 
even parity, and any others that local wisdom 
advises. Establish a connection using whatever 
magic is needed for your terminal. UNIx should 
type “login:” at you. If it types garbage, you 
may be at the wrong speed; push the ‘break’ or 
‘interrupt’ key once. I[f that fails to produce a 
login message, consult a guru. 


When you get a “login:” message, type 
your login name in lower case. Follow it by a 
RETURN if the terminal has one. If a password 
is required, yu will be asked for it, and (if pos- 
sible) printing will be turned off while you type 
it, again followed by a RETURN. (On M37 Tele- 
types always use NEWLINE or LINEFEED in place 
of RETURN). 


The culmination of your login efforts is a 
percent sign “%’’. The percent sign means that 
UNIX is ready to accept commands from the 
terminal. (You may also get a message of the 
day just before the percent sign or a 
notification that you have mail.) 


Typing Commands 


Once you've seen the percent sign, you 
can type commands, which are requests that 
UNIX do something. Try typing 


date 
followed by RETURN. You should get back some- 
thing like 

Sun Sep 22 10:52:29 EDT 1974 
Don’t forget the RETURN after the command, or 
nothing will happen. If you think you're being 
ignored, type a RETURN; something should hap- 


pen. We won't show the carriage returns, but 
they have to be there, 


Another command you might try is who, 
which tells you everyone who is currently logged 
in: 


who 


gives something like 


pip uyf Sep 22 09:40 
bwk tlyg Sep 22 09:48 
mel ttyh Sep 22 09:58 


The time is when the user logged in. 


If you make a mistake typing the command 
name, UNIX will tell you. For example, if you 


type 
whom 
you will be told 


whom: not found 


Strange Terminal Behavior 


Sometimes you can get into a state where 
your terminal acts strangely. For example, each 
letter may be typed twice, or the RETURN may 
not cause a line feed. You can often fix this by 
logging out and logging back in. Or you can read 
the description of the command stty in section | 
of the manual. This will also teil you how to get 
intelligent treatment of tab characters (which are 
much used in UNIX) if your terminal doesn’t 
have tabs. If it does have computer-settable 
tabs, the command tabs will set the stops 
correctly for you. 


Mistakes in Typing 


If you make a typing mistake, and see it 
before the carriage return has been typed, there 
are two ways to recover. The sharp-character 
‘“#” erases the last character typed; in fact suc- 
cessive uses of ‘‘#” erase characters back to the 
beginning of the line (but not beyond). So if 


you type badly, you can correct as you go: 
dd#atie##e 


is the same as ‘date’. 


The at-sign ‘“‘@” erases all of the charac- 
ters typed so far on the current input line, so if 
the line is irretrievably fouled up, type an “@™ 
and start over (on the same line!). 


What if you must enter 3 sharp or at-sign 
as part of the text? If you precede either “#"* or 
“ta by a backslash “\"’, it loses its erase mean- 
ing. This implies that to erase a backslash, you 
have to type two sharps or two at-signs. The 
backsiash is used extensively in UNIX to indicate 
that the following character is in some way spe- 
cial. 


Readahead 


UNIX has full readahead, which means that 
you can type as fast as you want, whenever you 
want, even when some command is typing al 
you. If you type during output, your input char- 
acters will appear intermixed with the output 
characters, but they will be stored away by UNIX 
and interpreted in the correct order. So you can 
type two commands one after another without 
waiting for the first to finish or even begin. 


Stopping a Program 

You can stop most programs by typing the 
character “DEL” (perhaps calied ‘delete or 
“rubout”™ on your terminal). There are excep- 
tions, like the text editor, where DEL stops what- 
ever the program is doing but leaves you in that 
program. You can aiso just hang up the phone. 
The “interrupt” or “break” key found on most 
terminals has no effect. 


Logging Out 
The easiest way to log out is to hang up the 
phone. You can aiso type 


login name-of-new-user 


and let someone else use the terminal you were 
on. It is not sufficient just to turn off the terrni- 
nal. UNIX has no time-out mechanism, so you'll 
be there forever unless you hang up. 


Mail 
When you log in, you may sometimes get 
the message 
You have mail. 


UNIX provides a postal system so you can send 
and receive letters from other users of the sys- 
tem. To read your mail, issue the command 


mail Smith types “write joe” and waits. 
Joe now types his message (as many lines 


Your mail will be printed, and then you will be as he likes). When he’s ready for a reply, 


asked he signals it by typing (0), which stands 
Save? for “over”. 
Now Smith types a reply, also terminated 
If you do want to save the mail, type y, for by (0). 
“yes”, any other response means “no”. This cycle repeats until someone gets 
How do you send mail to someone else? tired; he then signals his intent to quit 
Suppose it is to go to “joe” (assuming “joe” is with (o-+0), for “over and out”. 
someone's login name). The easiest way is this: To terminate the conversation, each side 


must type a “controil-d” character alone 
on a line. (“Delete” also works.) When 
the other person types his “control-d”, 
you will get the message “EOT” on your 


mail joe 

now type in the text of the letter 
on as many lines as you like ... 
after the last line of the letrer 


type the character “contro/-—d", , terminal. 

that is, hold down “control” and type If you write to someone who isn’t logged 

a letter “d". in, or who doesn’t want to be disturbed, you'll 
And that’s it. The “controi-d” sequence, usually be told. If the target is logged in but doesn’t 
called “EOT’, is used throughout UNIX to mark answer after a decent interval, simply type 
the end of input from a terminal, so you might “controi-d”. 


as weil get used to it. 


There are other ways to send mail — you : 
can send a previously prepared letter, and you The UNIX Programmer's Manual is typicaily 


can mail to a number. of people all at once. For kept on-line. If you get stuck on something, and 
more details see mail (1). can’t find an expert to assist you, you can print 


. on your terminal some manual section that 

The notation mail (1) means the command ; 

a. Ss ; : . it | ttin 
mail in section (I) of the unix Programmer’s might help. It’s also useful for getting the most 


Mania up-to-date information on a command. To print 
; a manual section, type ‘‘man section-name”. 
Thus to read up on the whe command, type 


On-line Manual 


Writing to other users 


Alt some point in your UNIX career, out of man who 

the blue will come a message like If the section in question isn’t in part I of the 
Message from joe... manual, you have to give the section number as 
; well, as in 

accompanied by a startling beep. It means that 
Joe wants to talk to you, but unless you take ex- man 6 chess 
plicit action you won’t be able to talk back. To Of course you're out of luck if you can't 
respond, type the command remember the section name. 

write joe 


li. DAY-TO-DAY USE 
This establishes a two-way communication path. 


Now whatever Joe types on his terminal will ap- Creating Files — The Editor 
pear on yours and vice versa. The path is slow, 
rather like talking to the moon. (If you are in 
the middie of something, you have to get to a 
State where you can type a command. Normaily, 
whatever program you are running has to ter- 
minate or be terminated. If you're editing, you 
can escape temporarily from the editor — read 
the manual.) 


if we have to type a paper or a letter or a 
program, how do we get the information stored 
in the machine? Most of these tasks are done 
with the UNIX “text editor’ ed. Since ed is 
thoroughly documented in ed (1) and explained 
in A Tutorial Introduction to the UNIX Text Editor, 
we won't spend any lime here describing how to 
use it. Ail we want it for right now is to make 


“ 


A protocol is needed to keep what you type some files. (A file is just a collection of informa- 
from getting garbled up with what Joe types. tion stored in the machine, a simplistic but ade- 
Typicaily it’s like this: quate definition.) 


Joe types “write smith” and waits. 


To create a file with some text in it, do the 
following: _ 


ed (invokes the text editor) 

a (command to “ed”, to add text) 
now type in 

whatever tex! you want ... 

, (signals the end of adding text) 


At this point we could do various editing opera- 
tions on the text we typed in, such as correcting 
spelling mistakes, rearranging paragraphs and the 
like. Finally, we write the information we have 
typed into a file with the editor command “w”; 


w junk 
ed will respond with the number of characters it 
wrote into the file called “junk”. 


Suppose we now add a few more lines with 
“a” terminate them with “.”, and write the 
whole thing out as “temp”, using 


w temp 


We should now have two files, a smaller one 
called “junk” and a bigger one (bigger by the 


extra lines) called “temp”. Type a ‘q” to quit | 


the editor. 


What files are out there? 


The Is (for “tist’} command lists the 
names (not contents) of any of the files that 
UNIX knows about. If we type 


Is 
the response will be 


junk 
temp 


which are indeed our two files. They are sorted 
into alphabetical order automatically, but other 
variations are possible. For example, if we add 


99 


the optional argument ‘-t”’, 
Is = 


lists them in the order in which they were last 
changed, most recent first. The “I” option gives 
a “long” listing: 


Is + 
will produce something like 


“rw-rw-rw= | bwk 41 Sep 22 12:56 junk 
-rw-rw-rw= | bwk 78 Sep 22 12:57 temp 


The date and time are of the last change to the 
file. The 41 and 78 are the number of characters 
(you got the same thing from ed). “bwk” is the 
owner of the file — the person who created it. 


” 


The ‘-rw-rwerw-” tells who has permission to 
read and write the file, in this case everyone. 


Options can be combined: “ts -it” would 
give the same thing, but sorted into time order. 
You can also name the files you're interested in, 
and Is will list the information about them only. 
More details can be found in Is (I). 


It is generally true of UNIX programs that 
“flag” arguments like ‘“‘-t” precede filename ar- 
guments. 


Printing Files 


Now that you’ve got a file of text, how do 
you print it so people can look at it? There are a 
host of programs that do that, probably more 
than are needed. 


One simple thing is to use the editor, since 
printing is often done just before making 
changes anyway. You can say 


ed junk 
1,Sp 


ed will reply with the count of the characters in 
“junk” and then print all the lines in the file. 
After you learn how to use the editor, you can 
be selective about the parts you print. 


There are times when it’s not feasible to 
use the editor for printing. For example, there is 
a limit on how big a file ed can handle (about 
65,000 characters or 4000 lines). Secondly, it 
will only print one file at a time, and sometimes 
you want to print several, one after another. So 
here are a couple of alternatives. 

First is cat, the simplest of all the printing 
programs. cat simply copies all the files in a list 
onto the terminal. So you can say 


cat junk 
or, to print two files, 
Cat junk temp 
The two files are simply concatenated (hence the 


name ‘“‘cat’’) onto the terminal. 


pr produces formatted printouts of files. 
As with cat, pr prints ail the files in a list. The 
difference is that it produces headings with date, 
ume, page number and file name at the top of 
each page, and extra lines to skip over the fold 
in the paper. Thus, 


pr junk temp 


will jist “junk” neatly, then skip to the top of a 
new page and list “temp” neatly. 


pr will also produce multi-column output: 
pr -3 junk 


prints “junk” in 3-column format. You can use 
any reasonable number in place of “3” and pr 
will do its best. 


It should be noted that pr is zor a format- 
ting program in the sense of shuffling lines 
around and justifying margins. The true for- 
matters are roff, nroff, and troff, which we wiil 
get to in the section on document preparation. 


There are also programs that print files on 
a high-speed printer. Look in your manual 
under opr and ipr. Which to use depends on the 
hardware configuration of your machine. 


Shuffling Files About 


Now that you have some files in the file 
system and some experience in printing them, 
you can try bigger things. For example, you can 
move a file from one place to another (which 
amounts to giving a file a new name), like this: 


mv junk precious 


This means that what used to be “junk” is now 
“precious”. If you do an ls command now, you 
will get 


precious 
_ temp 


Beware that if you move a file to another one 
that. already exists, the already existing contents 
are lost forever. 


If you want to make a copy of a file (that is, 
to have two versions of something), you can use 
the ep command: 


Cp precious temp! 


makes a duplicate copy of “precious” in 
“tempol”. 

Finally, when you get tired of creating and 
moving files, there is a command to remove files 


from the file system, called rm. 
rm temp temp! 


will remove all of the files named. You will get a 
warning message if one of the named files wasn’t 
there. 


Filename, What’s in a 


So far we have used filenames without ever 
saying what’s a legal name, so it’s time for a cou- 
ple of rules. First, filenames are limited to 14 
characters, which its enough to be descriptive. 
Second, aithough you can use almosi any charac- 


ter in a filename, common sense says you should 
suck to ones that are visible, and that you should 
probably avoid characters that might be used 
with other meanings. We already saw, for exam- 
pie, that in the Is command, “ts -t”” meant to list 
in time order. So if you had a file whose name 
was ‘-("’, you would have a tough time listing it 
by mame. There are a number of other charac- 
ters which have special meaning either to UNIX 
as a whole or to numerous commands. To avoid 
pitfalls, you would probably do well to use only 
letters, numbers and the period. (Don't use the 
period as the first character of a filename, for 
reasons too complicated to go into.) 


On to some more positive suggestions. 
Suppose you're typing a large document like a 
book. Logically this divides into many smail 
pieces, like chapters and perhaps sections. Phy- 
Sicaily it must be divided too, for ed will not 
handle big files. Thus you should type the docu- 
ment as a number of files. You might have a 
separate file for each chapter, called 


chap! 
chap2 
etc... 


Or, if each chapter were broken into severai files, 
you might have 


chapl.| 
chap 1.2 
chap 1.3 


chap2.1 
chap2.2 


see 


You can now tell at a glance where a particular 
file fits into the whoie. 


There are advantages to a systematic nam- 
Ing convention which are not obvtous to the no- 
vice UNIX user. What if you wanted to print the 
whole book? You could say 


pr chapt.1 chapl.2 chapl.s ..... 


but you would get tired pretty fast, and would 
probably even make mistakes. Fortunately, 
there is a shortcut. You can say 


pr chap* 


Ty Lad 


The means “anything at all”, so this 
translates into “print all files whose names begin 
with ‘chap’ “, listed in alphabetical order. This 
shorthand notation is not a property of the pr 
command, by the way. It is system-wide, a ser- 
vice of the program that interprets commands 
(the “shell” sh(I)). Using that fact, you can see 


how to fist the files of the book: 
is chap*® 
produces 


chap!/.! 
chap 1.2 
chapl.3 


eee 


The “** is not limited to the last position in a 
filename — it can be anywhere. Thus 


rm “junk® 


removes ali files that contain ‘“junk™ as any part 
of their name. As a special case, “°” by itseif 
matches every filename, so 


pr * 
prints ail the files (alphabetical order), and 

rm * | 
removes ail files. (You had better be sure that’s 


what you wanted to say!) 


The “*" is not the only pattern-matching 
feature available. Suppose you want to print 
only chapters 1 through 4 and 9 of the book. 
Then you can say 


pr chap{12349]* 


The “{...]" means to match any of the characters 
inside the brackets. You can also do this with 


pr chap(]-49}* 


“{aez]” matches any character in the range a 
through z. There is also a ‘“?” character, which 
matches any single character, so 


pr? 


will print ail files which have single-character 
names. 


Of these niceties, ‘“*” is probably the most 
useful, and you should get used to it. The others 
are frills, but worth knowing. 


If you should ever have to turn off the spe- 
cial meaning of ‘“*”, ‘“*?", etc., enclose the entire 
argument in quotes (single or double), as in 


ls a dag 


What’s in a Filename, Continued 


When you first made that file called 
“junk”, how did UNIX know that there wasn’t 
another “junk” somewhere else, especially since 
the person in the next office is also reading this 
tutorial? The reason is that generally each user 
. of UNIX has his own “directory”, which contains 


a6< 


only the files that belong to him. When you 
create a new file, unless you take special action, 
the new file is made in your own directory, and 
is umrejated to any other file of the same name 
that might exist in someone eise's directory. 


The set of all files that UNIx knows about 
are organized into a (usuaily big) tree, with your | 
files located several branches up into the tree. [It 
is possible for you to “walk” around this tree, 
and to find any file in the system, by starting at 
the root of the tree and walking along the right 
set of branches. 


To begin, type 
is / 
“/" is the name of the root of the tree (a con- 


vention used by UNIX). You will get a response 
something like this: 


This is a collection of the basic directories of 
files that UNIX knows about. On most systems, 
“usr” is a directory that contains ail the normal 
users of the system, like you. Now try 


ls /usr 


This should list a long series of names, among 
which is your own login name. Finally, try 


ls /ust/your-name 

You should get what you get from a plain 
ls 

Now try 
cat /usr/your-name/junk 

(if “Sunk” is still around). The name 
/ust/your-name/junk 


is called the “pathname” of the file that you nar- 
mally think of as “junk”. “Paithname”™ has an 
obvious meaning: it represents the full name of 
the path you have to follow through the tree of 
directories to get to a particular file. It is a 
universal rule in UNIX that anywhere you can 
use an ordinary filename, you can use a path- 
name. 


Here is a picture which may make this 
clearer: 


bin etc ust dev tmp 


/\\ ay 


junk temp 


Notice that Mary's “junk” is unrelated to 
Eve's. - 


This isn’t too exciting if all the files of in- 
terest are in your own directory, but if you work 
with someone else or on several projects con- 
currently, it becomes handy indeed. For exam- 
ple, your friends can print your book by saying 


pr /usr/your-name/chap* 


Similarly, you can find out what files your neigh- 
bor has by saying 


ls /usr/neighbor-name 
or make your own copy of one of his files by 


cp /usr/your-neighbor/his-file yourfile 


(If your neighbor doesn't want you poking 
around in his files, or vice versa, privacy can be 
arranged. Each file and directory can have 
read-write-execute permissions for the owner, a 
group, and everyone else, to control access. See 
Is ({) and chmod (I) for details. As a matter of 
observed fact, most users most of the time find 
openness of more benefit than privacy.) 


As a final experiment with pathnames, try 
ls /bin /usr/bin 


Do some of the names look familiar? When you 
run 2 program, by typing its name after a “%’’, 
the system simply looks for a file of that name. 
It looks first in your directory (where it typically 
doesn’t find it), then in ‘/bin” and finally in 
“/usr/bin”. There is nothing magic about com- 
mands like cat or Is, except that they have been 
collected into two places to be easy to find and 
administer. 


What if you work regularly with someone 
else on common information in his directory? 
You could just log in as your friend each time 
you want to, but you can also say “I want to 
work on his files instead of my own”. This is 
done by changing the directory that you are 
currently in: 


chdir /usr/your-fnend 


Now when you use a filename in something like 
cat or pr, it refers to the file in ‘“your-friend’s” 
directory. Changing directories doesn't affect 
any permissions associated with a file — if you 
couldn't access a file from your own directory, 
changing to another directory won't alter that 
fact. 


If you forget what directory you're in, type 
pwd 


(“print working directory”) to find out. 


It is often convenient to arrange one’s files 
so that all the files related to one thing are in a 
directory separate from other projects. For ex- 
ampie, when you write your book, you might 
want to keep all the text in a directory called 
book. So make one with 


mkdir book 
then go to it with 
chdir book 


then start typing chapters. The book is now 
found in (presumably) 


/ust/your-name/book 


To delete a directory, see rmdir (1). 


You can go up one level in the tree of files 
by Saying 


chdir .. 


“is the name of the parent of whatever direc- 


tory you are currently in. For completeness, “. 
is an alternate name for the directory you are in. 


Using Files instead of the Terminal 


Most of the commands we have seen so far 
produce output on the terminal; some, like the 
editor, also take their input from the terminal. I[t 
iS universal in UNIX that the terminai can be re- 
placed by a file for either or both of input and 
output. As one example, you could say 


Is 
to get a list co” files. But you can also say 
Is >filelist 


to get a list of your files in the file “filelist”’. 
(“filelist’’ will be created if it doesn't already ex- 
ist, or overwritten if it does.) The symboi “>” is 
used throughoul UNIX tO mean “put the oulput 
on the following file, rather than on the termi- 
nal”. Nothing is produced on the terminal. As 
another example, you could concatenate several 
files into one by capturing the output of catin a 
file: 


cat fl f2 {3 >temp 


Similarly, the symbol "<" means to take 
the input for a program from the following file, 
instead of fram the terminal. Thus, you could 
make up a script of commonly used editing com- 
mands and put them into a file called “script”. 
Then you can run the script ona file by saying 


ed file <script 


Pipes 

One of the novel contributions of UNIX is 
the idea of a pipe. A pipe is simply a way to 
connect the output of one program to the input 
of another program, so the two run as 4 se- 
quence of processes — a pipe-line. 


For example, 


\ 


pr ff gh 


will print the files “f", “g” and “h", beginning 
each on a new page. Suppose you want them 
run together instead. You could say 


cat f g h >temp 
pr temp 
rm temp 


but this is more work than necessary. Clearly 
what we want is to take the output of cat and 
connect it to the input of pr. So let us use a 
pipe: : 

cat f g h | pr 


The vertical bar means to take the output from 
cat, which would normally have gone to the ter- 
minal, and put it into pr, which formats it neatly. 


Any program that reads from the terminal 
can read from a pipe instead; any program that 
writes on the terminal can drive a pipe. You can 
have as many elements in a pipeline as you 
wish. 

Many UNIX programs are written so that 
they will take their input from one or more files 
if file arguments are given; if no arguments are 
given they will read from the terminal, and thus 
can be used in pipelines. 


The Shell 


We have already mentioned once or twice 
the mysterious “shell,” which is in fact sh (I). 
The shell is the program that interprets what you 
type as commands and arguments. It also looks 
after translating “*”, etc., into lists of filenames. 


The sheil has other capabilities too. For 
example, yOu can Start two programs with one 
command line by separating the commands with 


a semicoion; the shell recognizes the semicoion 
and breaks the line into two commands. Thus 


date; who 


does both commands before returning with a 
°° 

You can also have more than one program 
running simultaneousiy if you wish. For exampie, 
if you are doing something time-consuming, like 
the editor script of an earlier section, and you 
don’t want to wait around for the results before 
Starting something else, you can say 


ed file <script & 


The ampersand at the end of a command line 
Says “start this command running, then take 
further commands from the terminal immediate- 
ly.” Thus the script will begin, but you can do 
something else at the same time. Of course, to 
keep the output from interfering with what 
you're doing on the terminal, it would be better 
to have said 


ed file <script >lines & 


which would save the output lines in a file called 
“tines”. 

When you initiate a command with “&”, 
UNIX replies with a number called the process 
number, which identifies the command in case 
you later want to stop it. If you do, you can say 


kill process-number 


You might also read ps (1). 
You can say 


(command~1; command-2; command-3) & 


to start these commands in the background, or 
you can start a background pipeline with 


command-! | command-2 & 


Just as you can tell the editor or some 
similar program to take its input from a file in- 
stead of from the terminal, you can tell the shell 
10 read a file to get commands. (Why not? The 
shell after ail is just a program, albeit a clever 
one.) For instance, suppose you want to set tabs 
on your terminal, and find out the date and 
who’s on the system every time you log in. 
Then you can put the three necessary com- 
mands ( tabs; date; who) into a file, let’s cail it 
“xxx”, and then run it with either 


sh xxx 
or 
sh <xxx 


This says to run the shell with the file “xxx" as 
input. The effect is as if you had typed the con- 
tents of “xxx” on the terminal. (If this is to be 
a regular thing, you can eliminate the need to 
type “sh”; see chmod (1) and sh (I).) 


The shell has quite a few other capabilities 
as well, sore of which we'll get to in the section 
on programming. 


fll. DOCUMENT PREPARATION 


UNIX IS extensively used for document 
preparation. There are three major formatting 
programs, that is, programs which produce a text 
with justified right margins, automatic page 
numbering and titling, automatic hyphenation, 
and the like. The simpiest of these formatters is 
roff, which in fact is simple enough that if you 
type almost any text into a file and “roff” it, you 
will get plausibly formatted output. You can do 
better with a little knowledge, but basically it’s 
easy to learn and use. We'll get back to roff 
shortly. 


nroff is similar to roff but does much less 
for you automatically. [It will do a great deal 
more, once you know how to use it. 


Both roff and nroff are designed to produce 
Output on terminals, line-printers, and the like. 
The third formatter, troff (pronounced “tee- 
roff’), instead drives a Graphic Systems photo- 
typesetter, which produces very high quality out- 
put on photographic paper. This paper was 
printed on the phototypesetter by troff. 


Because nroff and troff are relatively hard 
to learn to use effectively, several “packages” of 
canned formatting requests are available which 
let you do things like paragraphs, running titles, 
multi-column output, and so on, with little effort. 
Regrettably, details vary from system to sysiem. 


ROFF 


The basic idea of roff (and of nroff and 
troff, for that matter) is that the text to be fur- 
matted contains within it “formatting com- 
mands” that indicate in detail how the formatted 
text is to look. For example, there might be 


commands that specify how long lines are,. 


whether to use single or double spacing, and 
what running titles to use on each page. In gen- 
eral, you don’t have to spell out ail of the possi- 
ble formatting details. Most of them have ‘“‘de- 
fault values”, which you will get if you say noth- 
ing at all. For example, unless you take special 
precautions, you'll get single-spaced output, 
§5-character lines, justified right margins, and 58 


text lines per page when you roff a file. This is 
the reason that roff is so simple — most of the 
decisions have already been made for you. 


Some things do have to be done, however. 
If you want a document broken into paragraphs, 
you have to teil roff where to add the extra 
blank lines. This is done with the “sp” com- 
mand: 


this is the end of one paragraph. 


Sp 
This begins the next paragraph ... 


In roff (and in aroff and troff), formatting com- 
mands consist of a period followed. by two 
letters, and they must appear at the beginning of 
a line, all by themseives. The “sp” command 
tells roff to finish printing any of the previous 
line that might be still unprinted, then: print a 
blank line before continuing. You can have 
more space if you wish; “sp 2° asks for 2 
spaces, and so on. 


If you simply want to ensure that subse- 
quent text appears on a fresh output line, you 
can use the command “.br” (for “break”) in- 
stead of “sp”. 


Most of the other commonly-used roif 
commands are equally simple. For example you 
can center one or more lines with the “.ce” com- 
mand. 


ce 
Title of Paper 
sp 2 


causes the title to be centered, then followed by 
two blank lines. As with “sp”, “.ce” can be fol- 
lowed by a number; in that case, that many in- 


put lines are centered. 


“ul” underlines lines, and can also be foi- 
lowed by a number: 


ce 2 

ul 2 

An Earth-shaking Paper 
sp 

John O Scientist 


will center and underline the two text lines. No- 
tice that the “sp” between them is not part of 
the line count. 

You can get multiple-line spacing instead 
of the default single-spacing with the “Js” com- 
mand: 


ls 2 


causes double spacing. 


-1Q0- 


If you're typing things like tables, you will 
not want the automatic  filling-up and 
justification of output lines that is done by de- 
fault. You can turn this off with the command 
“nf (no-fill), and then back on again with “.fi" 
(fill). Thus 


this section is filled by default. 

nf 

here tines will appear just 

as you typed them — 

no extra spaces, no moving of words. 
Ai 

Now go back to filling up output lines. 


You can change the line-length with “li”, 
and the left margin (the indent) by “in”. These 
are often used together to make offset biocks of 
text: 


it =10 

in +10 

this text will be moved 1{0 
Spaces (to the right and the 
lines will also be shortened 10 
characters from the right. The 
“+” and “=” mean to change 
the previous value by that 
much. Now revert: 

It +10 

in -10 


Notice that “1 +10” adds ten characters to the 
line length, while “ll 10” makes the line ten 
characters /ong. 


The “ti” command indents (in either 
direction) just like “in”, except for only one 
line. Thus to make a new paragraph with a 
10-character indent, you would say 


SD 
ti +10 
New paragraph ... 


You can put running titles on both top and 
bottom of each page, like this: 


he "left top"center top” right top” 
fo "left bottorn"center bottom’ right bottom” 


The header or footer is divided into three parts, 
which are marked off by any character you like. 
(We used a double quote.) If there’s nothing 
between the markers, that part of the title will 
be blank. If you use a percent sign anywhere in 
“che” or “fo”, the current page number will be 
inserted. So to get centered page numbers with 
dashes around them, at the top, use 


he We % wa" 


You can skip to the top of a new page at any 
time with the ‘“.bp” command: if “bp” is fol- 
lowed by a number, that will be the new page 
number. 


The foregoing is probably enough about 
roff for you to go off and format most everyday 
documents. Read roff (1) for more details. 


Hints for Preparing Documents 


Most documents go through several ver- 
sions (always more than you expected) before 
they are finally finished. Accordingly, you 
should do whatever possible to make the job of 
changing them easy. 


First, when you do the purely mechanical 
operations of typing, type so subsequent editing 
will be easy. Start each sentence on a new line. 
Make lines short, and break lines at natural 
places, such as afier commas and semicolons, 
rater than randomly. Since most peopie change 
documents by rewriting phrases and adding, 
deleting and rearranging sentences, these precau- 
tions simplify any editing you have to do later. 


The second aspect of making change easy 
is not to commit yourself to formatting details 
too early. For example, if you decide that each 
paragraph is to have a space and an indent of 10 
characters, you might type, before each, 


Sp 
ti +10 


But what happens when later you decide that it 
would have been better to have no space and an 
indent of only 5 characters? It’s tedious indeed 
to go back and patch this up. 


Fortunately, all of the formatters let you 
delay decisions until the actual moment of run- 
ning. The secret is to define a new operation 
(called a macro), for each formatting operation 
you want to do, like making a new paragraph. 
You can Say, in all three formatters, 


de PP 


Sp 
ti +10 ‘ 


This defines “.PP” as a new roff (or nroff or troff) 
operation, whose meaning is exactly 


sp 
ti +10 


(The “.." marks the end of the definition.) 
Whenever “.PP” is encountered in the taxt, it is 


as if you had typed the two lines of the 
definition in place of it. 


The beauty of this scheme is that now, if 
you change your mind about what a paragraph 
should look like, you can change the formatted 
output merely by changing the definition of 
* PP” and re-running the formatter. 


As a rule of thumb, for all but the most 
trivial jobs, you shouid type a document in terms 
of a set of macros like “.PP”, and then define 
them appropriately. As long as you have entered 
the text in some Systematic way, it can always be 
cleaned up and re-formatted by a judicious com- 
bination of editing and macro definitions. The 
packages of formatting commands that we men- 
tioned earlier are simply collections of macros 
designed for particular formatting tasks. 


One of the main differences between roff 
and the other formatters is that macros in roff 
can only be lines of text and formatting com- 
mands. In aroff and troff, macros may have ar- 
guments, so they can have different effects 
depending on how they are called (in exactly the 
same way that the “.sp’’ command has an argu- 
ment, the number of spaces you want). 


Miscellany 


In addition to the basic formatters, UNIX 
provides a host of supporting programs. eqn and 
neqn let you integrate mathematics into the text 
of a document, in a language that closely resem- 
bles the way you would speak it aloud. spell and 
typo detect possible spelling mistakes in a docu- 
ment. grep looks for lines containing a particular 
text pattern (rather like the editor’s context 
search does, but on a whole series of files). For 
example, 


grep “ingS” chap* 


will find all lines ending in the letters “ing” in 
the series of files “chap*”. (It is almost always a 
good practice to put quotes around the pattern 
you're searching for, in case it contains charac- 
ters that have a special meaning for the shell.) 


we counts the words and (optionally) lines 
in a set of files. tr transiates characters into oth- 
er characters; for exampie it will convert upper 
to lower case and vice versa. This translates 
upper into lower: 


tr “{A-Z}" "{a-z]° 


duff prints a list of the differences between 
two files, so you can compare two versions of 
something automatically (which certainly beats 
proofreading by hand). sort sorts files in a 
variety of ways; cref makes cross-references; ptx 
makes a permuted index (keyword-in-context 
listing). 


Most of these programs are either indepen- 
dently documented (like eqn and neqn), or are 
sufficiently simple that the description in the 
UNIX Programmer's Manual is adequate explana- 
tion. 


IV. PROGRAMMING 


UNIX is a marvelously pleasant and produc- 
live system for writing programs; productivity 
seems to be an order of magnitude higher than 
on other interactive systems. 


There will be no attempt made to teach 
any of the programming languages available on 
UNIX, but a few words of advice are in order. 
First, UNIX is written in C, as is most of the ap- 
plications code. If you are undertaking anything 
substantial, C is the only reasonable choice. 
More on that in a moment. But remember that 
there are quite a few programs already written, 
some of which have substantial power. 


The editor can be made to do things that 
would normally require special prograrns on oth- 
er systems. For example, to list the first and last 
lines of each of a set of files, say a book, you 
could laboriously type 


ed 

e chapl.! 
Ip 

Sp 

e chap!.2 
Ip 

Sp 

etc. 


But instead you can do the job once and for all. 
Type 

ls chap* >temp 
to get the list of filenames into a file. Then edit 
this file to make the necessary series of editing 


commands (using the global commands of ed), 
and write it into “script”. Now the command 


ed <script 


. will produce the same output as the laborious 


hand typing. 

The pipe mechanism lets you fabricate 
quite complicated operations out of spare parts 
already built. For example, the first draft of the 
spell program was (roughly) 


cat... (collect the files) 

{tr.. (put each word on a new line, 
delete punctuation, etc.) 

[sort (into dictionary order) 

luniq (strip out duplicates) 


[comm (list words found in text but 
fot in dictionary) 


Programming the Shell 


An option often overlooked by newcomers 
is that the shell is itself a programming language, 
and since UNIX already has a host of buiiding- 
block programs, you can sometimes avoid writ- 
ing a special purpose program merely by piecing 
together some of the building blocks with shell 
command files. 


As an unlikely example, suppose you want 
to count the number of users on the machine 
every hour. You could type 


date 
who | we -l 


every hour, and write down the numbers, but 
that is rather primitive. The next step is prob- 
ably to say 


(date; who | we <i) > >users 


which uses “>>” to append to the end of the 
file “users”. (We haven't mentioned “>>” bde- 
fore — it’s another service of the shell.) Now all 
you have to do is to put a loop around this, and 
ensure that it’s done every hour. Thus, place 
the following commands into a file, say “count”: 


: loop 

(date; who | we +1) > >users 
sieep 3600 

goto loop 


The command : is followed by a space and a la- 
bel, which you can then goto. Notice that it’s 
quite legal to branch backwards. Now if you is- 
sue the command 


sh count & 


the users will be counted every hour, and you 
can go on with other things. (You will have to 
use kill to stop counting.) 


If you would like “every hour” to be a 
parameter, you can arrange for that too: 


: loop 

(date; who| we - 1) > >users 
sleep $1 

goto loop 


“$1” means the first argument when this pro- 
cedure is invoked. If you say 


a pe 


sh count 60 


it will count every minute. A shell program can 
have up to nine arguments, “$1” through “S9”. 


The other aspect of programming is condi- 
tional testing. The if command can test condi- 
tions and execute commands accordingly. As a 
simple example, suppose you want to add to 
your login sequence something to print your 
mail if you have some. Thus, knowing that mail 
is stored in a file called ‘mailbox’, you could say 


if -- mailbox mail 


This says “if the file ‘mailbox’ is readable, exe- 
cute the mail command.” 


As another example, you could arrange 
that the “count” procedure count every hour by 
default, but allow an optional argument to speci- 
fy a different time. Simply replace the “sieep 
$i” line by 


if $Slx = x sieep 3600 
if $lx != x sleep $1 


The construction 
if Slx =x 


tests whether “S1", the first argument, was 
present or absent. 


More complicated conditions can be tested: 
you can find out the status of an executed com- 
mand, and you can combine conditions with 
‘and’, ‘or’, ‘not’ and parentheses — see if (I). 
You should aiso read shift (I) which describes 
how to manipulate arguments to shell command 
files. 


Programming in C 

As we said, C is the language of choice: 
everything in UNIX is tuned to it. It is also a 
remarkably easy language to use once you get 
started. Sections II and II] of the manual 
describe the system interfaces, that is, how you 
do I/O and similar functions. 


You can write quite significant C programs 
with the level of I/O and system interface 
described in Programming in C: A Tutorial, if you 
use existing programs and pipes to help. For ex- 
ample, rather than learning how to open and 
close files you can (at least temporarily) write a 
program that reads from its standard input, and 
use cat to concatentate several files into it. This 
may not be adequate for the long run, but for 
the early stages it’s just right. 


There are a number of supporting pro- 
grams that go with C. The C debugger, cdb, is 
marginally useful for digging through the dead 


bodies of C programs. db, the assembly 
language debugger, is actually more useful most 
of the time, but you have to know more about 
the machine and system to use it well. The most 
effective debugging tool is still careful thought, 
coupled with judiciously placed print statements. 


You can instrument C programs and thus 
find out where they spend their time and what 
parts are worth optimising. Compile the routines 
with the “-p” option; after the test run use prof 
to print an execution profile. The command 
time will give you the gross run-time statistics of 
a program, but it’s not super accurate or repro- 
ducibie. 


C programs that don’t depend too much on 
special features of UNIX can be moved to the 
Honeywell 6070 and 18m 370 systems with mod- 
est effort. Read The ccas C Library by M. E. 
Lesk and B. A. Barres for details. 


Misceilany 


If you Aave to use Fortran, you might con- 
sider ratfor, which gives. you the decent control 
Structures and free-form input that characterize 
C, yet lets you write code that is still portabie to 
Other environments. Bear in mind that UNIX 
Fortran tends to produce large and relatively 
slow-running programs. Furthermore, support- 
ing software like db, prof, etc., are all virtually 
useless with Fortran programs. 


If you want to use assembly language (ail 
heavens forfend!), try the implementation 
language LiL, which gives you many of the ad- 
vantages of a high-leve! language, like decent 
control flow structures, but still lets you get close 
to the machine if you really want to. 


If your application requires you to translate 
a language into a set of actions or another 
language, you are in effect building a compiler, 
though probably a smail one. In that case, you 
should be using the yace compiler-compiler, 
which helps you develop a compiler quickly. 


¥. UNIX READING LIST 
Generai: 


UNIX Programmer’s Manual (Ken Thompson, 
Dennis Ritchie, and a cast of thousands). Lists 
commands, system routines and interfaces, file 
formats, and some of the maintenance pro- 
cedures. You can’t live without this, although 
you will probably only read section I. 


The Unix Time-sharing System (Ken Thompson, 
Dennis Ritchie). CACM, July 1974. An over- 
view of the system, for people interested in 
Operating systems. Worth reading by anyone 


et 


who programs. Contains a remarkable number 
of one-sentence observations on how to do 
things right. 


Document Preparation: 


A Tutorial Introduction to the UNIX Text Editor. 
(Brian Kernighan). Beil Laboratories internal 
memorandum. Weak on the more esoteric uses 
of the editor, but still probably the easiest way to 
learn ed. 


Typing Documents on UNIX. (Mike Lesk). Beil 
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package to isolate the novice from the vagaries 
of the formatting programs. If this specific pack- 
age isn't available on your system, something 
similar probably is. This one works with both 
nroff and troff. 


Prograrruning: 


Programming in C: A Tutorial (Brian Ker- 
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no help at all with the interface to the system 
beyond the simplest 1O. Should be read in con- 
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C Reference Manual (Dennis Ritchie). Seil La- 
boratories internal memorandum. An excellent 
reference, but a bit heavy going for the be- 
ginner, especially one who has never used a 
language like C. 
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System Software. 
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1. Introduction 


This paper describes how to write UNIX programs that interface with the operating system 
in a non-trivial way. This includes programs that use files by name, that do large amounts of 
input or output, that invoke other commands as they run, or that attempt to catch interrupts 
and other signals during execution. . 


The document coilects material which is scattered throughout several sections of The UNIX 
Programmer's Manual {1}. There is no attempt to be complete; only generally useful material is 
dealt with. It is assumed that you will be programming in C, so you will have to be able to 
read the language roughly up to the level of Programming in C — A Tutorial [2]. You should 

_also be familiar with UNIX itself to the level of UNIX for Beginners (3). 


2.. Program Arguments 


When a C program is run as a command, the arguments on the command line are made 
available to the main program as an argument count arge and an array of character strings 
argv containing the arguments. By convention, argv{O] is the command name itseif, so argc is 
always greater than 0. 


Here is a program that simply echoes its arguments back to the terminal. (This is much 
like the command echo.) 


main( argc, argv ) 
int arge; 

char xargv{ }; 

{ 


int 1; 

for( im=1; i < argc; i++ ) 
printf( "%s ", argvii] ); 

putchar( ‘\n’ ); 


main is called with two arguments, the argument count and the array of arguments. argv is a 
pointer to an array, whose individual elements are pointers to arrays of characters; each is ter- 
minated by ‘\0’, so they can be treated as strings. argv{O] is the name of the command itself, 
SO we Start by printing argv[1] and loop until we’ve printed them all. Each argvli] is a charac- 
ter array, SO we use a ‘%S’ in the printf. 


A common convention in UNIX programs is that an argument which begins with ‘—’ tndi- 
cates a flag or option of some sort. For example, suppose we want a program to be callable as 


prog —abc arg arg2 ... 


where the ‘—’ argument is optional; if it is present, it must be first and may be followed by any 
combination of the options ‘a’, ‘b’, and ‘c’. 


main(arge, argv) 
int arge; 
char eargvi ); 


aflag = bflag = cflag = 0; 
while( argc > 1 && argv(1][(O] == '—’ ) { 
for( i= 1; (c=argv{1][i]) == ‘\0’; i++ ) 
if{ ca='q’ ) . 
aflag+-+; 
else if( c=="b’ ) 
btlag++; 
else iff c=='c’ ) 
cflag++; 
else 
printf "%c?\n", c ): 
——argc; 
++-argv; 


The statements 
—— argc; 
+-+argv; 
drop the first argument from the list and adjust the count, so after interpreting the flag argu- 
ment, the rest of the program is independent of whether or not it existed. This works because 
argv is a pointer which can be incremented. Notice also that, for greatest generality, the while 
and for loops in combination allow the user to write the options either as 
program ~—a —b —c 
or as 
program —abc 


This degree of generality is unfortunately not very common; most commands require one form 
or the other. 


The argument count and the arguments are parameters to main. If you want to keep the 
arguments 4round so other routines can get at them, you will probably want to copy them to 
external variables. 


3. Rudimentary Input and Output 


The next several sections will discuss various aspects of input/output, including how to 
create, open, and close files from programs. There are several ways to do most of these opera- 
tions; these sections are organized so easy things are described first. 


3.1. The “Standard Input” and “Standard Output” 


The simplest input mechanism is to read the “standard input,” which is generally the 
user’s terminal. The function getchar() returns the next input character each time it is called. 
Of course, a file may be substituted for the terminal by using the ‘<’ convention: if prog uses 
getchar, then the command line 


prog <file 
causes prog to read file instead of the terminal. prog itself knows nothing about where its in- 


put is coming from. This is also true if the input comes from another program via the UNIX 
pipe mechanism: 
otherprog | prog 

will provide the input for prog from the output of otherprog. 

getchar returns the value zero (often written as the null-character ‘\0’) when it en- 
counters the end of file (or an error) on whatever you are reading. Bear in mind that ‘\0’ may 
be a legitimate value in some contexts. [f it is, you can’t use getchar; read ahead to the discus- 
sion of getc. 


In a similar manner, putchar(c) puts the character c on the “‘standard output,” which is 
also by default the terminal. The output can be captured on a file by using ‘>’: if prog uses 
putchar, : 


prog >outfile 

will write the output onto outfile instead of the terminal. And a pipe can be used: 
prog | otherprog 

puts the output of prog into the input of otherprog. 


The function printf, which formats output in various ways, uses putchar to finally print 
the output, so output produced by printf also finds its way to the standard output. 


A surprising number of programs read only one input and write one output; for such pro- 
grams [/O with getchar, putchar, and printf may be entirely adequate, and it is almost always 
enough to get started. This is particularly true, given the UNIX pipe facility for connecting the 
output of one program to the input of the next. For example, here is a complete program that 
acts as a “filter” to strip out all ascii control characters from ils input (except for newline and 
tab). 


main( ) { 
int c; 
while( c=getchar( ) ) 
iff (c>=’ ' && c<0177) || c=="\t' || c=="\r’ ) 
putchar(c); 
exit( 0 ); 


If it is necessary to treat multiple files, you can use Cat to collect the files for you: 
cat file1 file2 ... | ccstrip >output 


and thus avoid learning how to access files from a program. By the way, the call to exit at the 
end is not necessary to make the program work properly, but it assures that any caller of the 
program will see a normal termination status (conventionally 0) from the program when it 
completes. Section 7.3 discusses status returns in more detail. 


3.2. File Descriptors 


Before we go much further into our description of I/O, we have to talk about file descrip- 
tors. Any program which does any input or output does so by reading or writing files. This is 
true even though the file in question may actually be a device like the user’s terminal. Associ- 
ated with each file being used for input or output is a small non-negative integer called a “‘file 
descriptor;” whenever I/O is to be done on the file, the file descriptor is used to identify the 
file, instead of the name. (This is roughly analogous to the use of READ(S...) and WRITE(6,..) in 
Fortran.) 


In the most general case, to do I/O on a file you have to 


Open _ the file for reading and/or writing — UNIX connects the name of the file with a file 
descriptor which it generates and returns to you. An alternative form of open will 
create the file if it doesn’t exist. 


Read or write on the file. 

Close the file, which breaks the connection between name and descriptor. 
And finally, you may want to 

Buffer input and output if necessary for efficiency. 


In the simplest cases, like getchar and putchar, all opening and closing is done for you; 
you only have to worry about reading and writing the right things. In more complicated situa- 
tions, you have to do more work, but you gain flexibility. 


getchar and putchar depend on the fact that when the command interpreter (the shell) 
runs a program, it opens three files, with file descriptors 0, 1, and 2. All of these are normally 
connected to the terminal, so if you read 0 and write | or 2, I/O is done on the terminal. If 
you use ‘<’ or ‘>’, the sheil changes the default assignment for file 0 or 1 from your terminal 
to the named file, and opens it for you. This way your program need not know where its input 
comes from nor where its output goes, so long as it uses files 0 and 1. Naturally, getchar reads 
0 and putchar writes 1. When the program terminates, the files are closed automaticaily. . 
(We'll get back to file 2 in Section 3.4.) 


3.3. Buffering the Standard Input and Output 


If you are producing large amounts of output, you may find that programs which use 
putchar are slow, because each call to it actually requires a system call. You can speed up such 
a program markedly by buffering the output. (Generally UNIX buffers input automatically, 
which is almost always what is wanted. See Section 6 for how to turn off buffering.) 


Buffering requires a little witchcraft. First we need a buffer area. This can be provided 
by referring to an external variable fout which is used by putchar for output. fout is actually a 
structure defined as part of putchar, but its internal structure needn’t concern us now. Second, 
we have to connect the standard output to fout. Lastly, when all processing is done, any out- 
pul remaining in the output buffer has to be flushed out; this is not done automatically. Put- 
ting all this together gives 


main( ) { 
extern int fout; 
fout = dup( 1 ); /* buffer standard output +/ 
[ processing ... ] 
flush( );/* force out last buffer «/ 
exit( 0 ); 


This should probably be treated as black magic for now. Briefly, putchar buffers only if 
the file has a descriptor greater than 2. The call to dup creates a new file descriptor that refers 
to the same file as the original 1, but is guaranteed to be buffered. flush forces out everything 
that has coilected in the buffer. 


AS an aside, there is an external variable fin used by getchar much as putchar uses fout. 


3.4. Diagnostic Output — The Error File 


If we are going to run our control-character stripping program, it might be nice to know 
whether it actually removed any characters. But at the same time we can’t just print a mes- 
sage, because that will be sent to the same place as the data itself. That is, if we say 


cestrip <infile >outfile 


we don’t want outfile to contain a line saying “there were 3 bad characters.” 


Here is how to get this kind of diagnostic information separated from the standard output 
and placed on the terminal regardless. Just as the file descriptors 0 and 1 are predefined, so is 
file descriptor 2. Unless you go out of your way to change it, output written on file 2 will find 
its way to your terminal. Thus in simple cases, you can simply set fout to 2 to direct output to 
the terminal. 


main( ) { 
extern int fout; 
flush( );/* clear out standard output +*/ 
fout = 2; 
printf( "%d bad characters\n", badchar ); 
| exit( O ): 


This can get clumsy if the diagnostic output is to show up during the running of the program 
instead of ail at the end, because you have to flush output for one file before switching to 
another, each time you switch. 


4. The Portable C Library 


The portable C library [3] was written by Mike Lesk to provide a set of high-level I/O 
routines that could be implemented on any machine with a C compiler, and thus permit some 
degree of program transferability. C programs which use this library for I/O can be moved, 
with essentially no change, between UNIX, GCOS, and IBM-TSO. Alt the same time, many of the 
details of buffering, file access, etc., are hidden from the user. The routines are somewhat 
bigger and slower than the analogous routines that are a standard part of UNIX, but they do 
handle a number of things automatically which other packages do not. 


If you use programs from the portable C library, you have to ask specifically that it be 
searched when you compile or link-edit your program by writing ‘—lp” at the end of the argu- 
ments to the cc command: 


cc prog.c —ip 


4.1. File Opening and Closing 


Before a file can be read or written, it has to be opened with the routine copen. copen 
has two arguments, the file name, and the type of access wanted (read, write, or append). 
Thus: 


fd = copen( "/usr/pwk/foo", 'r’ ); 


opens /usr/owk/foo for reading. The value returned by copen is the file descriptor assigned 
by UNIX, to be used later for reading the file. If this number is —1, an error has occurred, so 
some defensive action has to take place. The usual code is 


if( ((d=copen(name,mode)) == —1 ) 
error{ "Can't open file", name ); 


where error is some message printer. 

mode is one of ‘r’, ‘w’, or ‘a’. The arguments ‘w’ and ‘a’ mean writing and appending. If 
the file is to be opened for writing, and if it doesn’t already exist, it will be created for you. If 
the argument is ‘w’ and if the file already exists, it will be truncated to zero length. If the ar- 
gument is ‘a’, you will write at the end of the file in either case. (And of course if any of this 
fails, you will get a —1 error return.) 


When you finish processing a file, you should close it explicitly with 
cclose(fd): 


where fd is the file descriptor handed you by copen. cclose will flush out any buffer contents 
remaining before closing the file; there is no flush command in the portable C library. Finally, 
when your program is done, you should call cexit which will close any open files and flush out 
their buffers; like exit, it then terminates the program and delivers its argument as termination 
status. By the way, there is a limit of 15 simultaneously open files, so if your program deals 
with many files it will have to call cclose explicitly. 


Let us illustrate these routines with a program modeled after grep. grep, the general 
purpose pattern finder, has several arguments, some of which are file names. If there are no 
file names, it reads the standard input. And it writes on the standard output. That is, 


grep pattern [optional list of input files] 
will print all input lines that contain “pattern”. Thus we write 


main( argc, argv ) 
int argc; 
char sargv{ |; 


char line{1000], pattern; 


int i, fd; 
iff argc < 2) { 
printf( 2, "Usage: grep pattern [file...]\\n" ); 
cexit( 1); 


} 
pattern = argv(1]; 
j = 2: 
do { | 
iff argc == 2 ) 
fd = 0;/* use standard input +/ 
else if( (fd=copen(argvii], 'r’)) == —1 ){ 
printf( 2, "can’t open %s\n", argv{il ); 
cexit( 1 ); 


while( gets( line, fd ) ) 
if( match( line, pattern ) ) 
puts( line ); 
iff fd i= O ) 
cclose{ fd ); 
} while( ++i < argc ); 
cexit( 0 ); 


First the arguments are validated. Then the files are gone through in order; each is 
opened (if possible), and scanned in the while loop. 


gets and puts are routines in the portable C library that read and write a line at a time. 
match is an unspecified routine (you write it!) that tells whether the line contains the pattern. 


Notice that we have written a call to printf with a first argument of 2. If the first argu- 
ment of printf is a small positive integer, this is assumed to be a file descriptor, and the output 
is sent there. (Caveat: this is true in the portable library only, unfortunately.) Thus any error 
messages go to the user’s terminal. 


There are a couple of other things to note in passing. First, the basic design of the pro- 
gram is that it allows input to be either from a set of files or from the standard input, and it 
writes on the standard oulput. This way the program can be used stand-alone or as part of a 
pipeline. It is important to design and implement programs this way whenever possible. 
Second, ithe program signals errors in two ways. The diagnostic output goes out onto the error 
file so it finds its way to your terminal instead of disappearing down a pipeline or into a file. 
Also the program returns a value when it calls [clexit, so the success or failure of the com- 
mand can be tested from within another program that uses this one as a sub-process. 


4.2. Character |/O 


The portable library provides two routines for I/O of individual. characters. cgetc and 
cputc are quite analogous to getchar and putchar, except that they require an additional argu- 
ment to specify a file descriptor. Thus 


cgetc{ 0); cputc( c, 1 ); 
reads the standard input and writes the standard output, while 
cgetc( fd); cputce( c, fd ); 


reads and writes fd. For convenience, getchar and putchar are also provided; they just call 
cgetc or cputc with the appropriate file descriptor argument. As an exercise you might write 
your own versions of gets and puts using cgetc and cputc. 


4.3. Miscellaneous 


The portable C library contains several other goodies, the most useful of which is prob- 
ably the function scanf, which provides input format conversion similar to printf on output; it 
will convert strings to integers, floating point numbers, and so on. There is also a way to use 
printf to do in-core format conversion: 


printf( —1, s, format, ... ); 


will put its output in string $ instead of a file. The ungetc function can be used tp push char- 
acters back onto the input stream for re-reading. And the ceof function can be used to test 
explicitly for end-of-file, so data which contains null characters can be handled with cgetc. 


5. Standard UNIX 1/O 


This section describes the I/O routines provided as part of “standard” UNIX. They do 
somewhat less for you than the portable library, but are more efficient, and perhaps more wide- 
ly available. The essential difference is that the user has to supply the buffer for each file ex- 
plicitly, and provide his own flushing of output before closing a file. These routines are 
described in sections II and III of the UN/X Programmer’s Manual {1}. 


Let us illustrate by writing a simplified version of cmp, a program that compares two files 
byte by byte. 


main( arge, argv ) 
int argc; 
char eargv[ J; 


int c1, c2, byte; 
int buf1(259], buf2(259]: 
if( argc |= 3) 
_ error( "Usage: cmp file1 file2" ); 

if{ fopen( argv[i], buf1 ) < 0) 

error( "can’t open %s", argv[1] ); 
if( fopen( argv(2], buf2 ) < 0) 

error( “can’t open %s", argv[2] ); 
for( byte=0 ; ; byte++ ) [ 

c1 = getc( buf1 ); 

c2 = getc( buf2 ); 

if(c1 < Olle2 <0) 


break; 

iff c1 != c2) 
, printf( "%6u %30 %3o0\n", byte, c1, ¢2 ); 
iff c1 == c2 ) 

exit( 0 ); 
iff c1 <0) 

printf “eof 1\n" ); 
else 

printf( "eof 2\n" ); 
exit( 1 ); 


error( $1, s2 ) 

char «s1; 

e $2; 
printf( s1, s2 ); 
printf "\n" ); 

| exit( 2 ); 


Files are opened with fopen, whose two arguments are the filename and a buffer, which is al- 
ways declared int(259]. The buffer is actually used as a structure by the I/O routines (for ex- 
ample, the file descriptor goes into buf{O0]), but we need not be concerned with that here; it is 
sufficient to note that the buffer is the connection between all the routines concerned with a 
file. As with copen, fopen returns a —1 if the access failed for any reason. error is a simple 
rouline to print out a message and exit with the appropriate status return. 


getc reads the 1,..put using the buffer argument to tell it what file to read. On end of file, 
getc returns the value -l, not zero, so it can be used 1» read data containing explicit zero 
characters, where getchar and cgetc are not suitable. 

The situation for output is slightly more complicated. First, the output file may not exist. 
The routine fcreat allows far this: 


fcreat( name, buf ) 
creates the file if it doesn’t exist; if it does exist, it is truncated to zero length. To write on the 
file, 

putc( ch, buf ) 
puts the character ch onto the file. When writing is done, call 

fflush( buf ) 
to force out any left-over output, then call 

close( buf{O] ) 
to close the file. Termination of the program, by calling exit or otherwise, closes all open files, 
but it does not flush the output buffers. 

As an aside, the way to delete a file is simply to call 

unlink( filename ); 

This returns —1 if the unlinking failed. Renaming a file uses both link and unlink: 


if ( link( oldname, newname ) >= 0) 
untink( oldname ): 


renames the file from “oldname” to “newname,” taking care not to delete the file if the link 
failed, as it would, for example, if “‘“newname”’ already existed. 


6. Low-Level |/O 


The lowest level of 1/O in UNIX provides no buffering or any other services; it is in fact a 
direct entry into the operating system. You are entirely on your own, but on the other hand, 
you have the most control over what happens. And since the calls and usage are quite simple, 
this isn’t as bad as it sounds. 


6.1. 1/O Proper 


For the low-level I/O routines, files are identified directly by their file descriptors. Here is 
a Simple version of cp, a program which copies one file to another. 


main( argc, argv ) 
int argc; 
char argv J; 


int f1, f2, n:; 
char buf(5 12]; 
if{ argc |= 3 ) 
error( "Usage: cp from to” ); 
iff (f1=open(argv(1], 0) < 0) 
error{ "can’t open %s", argv[1] ): 
if( (f2= creat(argv{2], C666)) < 0 ) 
error( “can’t open %s", argv{2] ); 
while( (n=read(f1, buf, 512)) > 0) 
write( f2, buf, n ); 
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exit( 0 ); 


open is rather like fopen except that instead of a buffer, its second argument tells what sort of 
access is required. 0 implies reading, 1 implies writing, and 2 is both read and write. A —I is 
returned if any error takes place, for example if the file doesn’t exist. 


creat is similar to fcreat, except that the second argument is the protection mode in 
which the output file is to be created. “0666” is read and write permission for everyone. creat 
opens the file for writing only. 


I/O is done by read and write. In both cases, the first argument is the file descriptor re- 
turned by a previous open or creat. The second argument is the place where the data is to 
come from or go to. The third argument is the number of bytes to be transferred. On reading, 
the actual number of bytes returned may be less than this value. Zero bytes implies end of 
file, a -1 implies an error of some sort. For writing, the returned value is the number of bytes 
actually written; it is generally an error if this isn’t equal to the number supposed to be writ- 
ten. 


The routine close may be used to close files after you are done with them. Termination 
of the program via exit or return from the main program closes all files, but since only 15 files 
can be open at once, close has to be used when many files are read or wrilten. 


lt is instructive to see how read and write can be used to construct special versions of 
things like getchar, putchar, etc. For example, if you don’t want buffered input, you have to 
write vour own getchar: 


getchar( ) { 
int c, n; 
Cc = 0; 
n = read( 0, &c, 1 ); 
return(n>O0 ?2¢:0); 


Al this point you might wonder why unbuffered input should be useful. In practice, it usually 
doesn’t matter, but consider the following sequence of commands: 


ed stuff 
18 s/ «#//g 
w stuff ' 


q 
opr stuff 


which uses the editor to replace all sequences of multiple blanks in “stuff” by a single blank, 
then prints “stuff” off-line. Now imagine this pair of commands in a shell command file, and 
finally consider what would happen if the editor read its standard input with buffering. Since 
the command file contains commands to both the editor and the shell, the editor, if it read 
with a large enough buffer, would consume not only not only its own commands but also the 
following opr command which is intended for the shell. One can imagine complicated schemes 
for pushing back unwanted input, but the simplest approach for the editor, and analogous pro- 
grams, iS lo read a single character at a time so they can’t disturb input not intended for them. 
(Incidentally, the portable librarv’s ungete function doesn’t solve this probiem, since it just 
pushes input back into an internal buffer which vanishes along with its caller on termination.) 


Similarly you can use the fact that printf calls putchar explicitly to make your own error- 
prinung routines. If you provide a putchar like this 


aie 


putchar(c) { 
write(2, &c, 1): 
a return(c); 


then calls to printf will write on the terminal, unbuffered. 


AS another example, this code duplicates the putchar and flush routines described above 
except that it always buffers: 
char . buf(5 12]; 
int ‘fdes 1; 
int nieft 512; 
char «nextfree &buf{0): 


putchar( c ) 


#nextfree++ = Cc; 
if ( ~—nleft <= 0) 
flush( ): 
} 


flush( ) 


{ 
write( fdes, buf, nextfree — buf ); 
aS nieft = 512: 
nextfree = buf; 


6.2. Random Access 
The seek routine provides a way to move around in a file without actually reading or 

writing. 

seek( fd, offset, ptr ); 
forces the current position in the file whose descriptor is fd to move to position offset, which is 
taken with respect to the location in the file specified by ptr. ptr can be 0, 1, or 2 to specify an 
offset measured from the beginning, from the current position, or from the end of the file 

—. respectively. For example, to append to a newly-opened file, 

seek( fd, 0, 2 ); 

and to get back to the beginning (“‘rewind’’), 


seek( fd, 0, 0 ): 


With seek, it is possible to treat files more or less like large arrays, at the price of slower 
access. Here is a routine to read an arbitrary record from an arbitrary place in a file. 


oe get( fd, pos, buf, n ) 
int fd, pos, n; 
char «buf: 


: seek( fd, pos, 0 ); /x get to pos «/ 
e n = read( fd, buf, n ): 
return( n ); 


ae ole 


Since integers have only 16 bits, the offset specified is limited to 65,536; for this reason, ptr 
values of 3, 4, 5 cause seek to multiply the given offset by 512 (the number of bytes in one 
physical block) and then interpret ptr as if it were 0, 1, or 2 respectively. Thus to get to an ar- 
bitrary place in a large file you need two seeks, first one which selects the block, then one 
which has ptr equal to | and moves to the desired byte within the block. 


6.3. Error Processing 


The routines discussed in this section, and in fact all the routines which are direct entries 
into the system can incur errors. Usually they indicate an error by returning a value of —1. 
Sometimes it is nice to know what sort of error occurred; for this purpose all these routines, 
when appropriate, leave an error number in the external cell errno. The meanings of the vari- 
ous error numbers are listed in the introduction to Section II of the UNIX Programmer's Manual, 
sO your program can, for example, determine if an attempt to open a file failed because it did 
not exist or because the user lacked permission to read it. Perhaps more commonly, you may 
want tO print out the reason for failure. The routine perror will print a message associated 
with the value of errno; more generally, sys_errmno is an array of character strings which can 
be indexed by errno and printed by your program. 


7. Executing Commands From Programs 


lt is often easier to use a program written by someone else instead of inventing one’s 
own. This section describes how to call a command from within a running program. 


7.1. With the Portable C Library 


The portable C library routine system takes one argument, a command string exactly as 
you would have typed it at the terminal (except for the new-line at the end) and executes it. 
This is probably the easiest way to execute a command from within a running program. For 
instance, to lime-stamp the output of a program, 


main( ) 
system({ "date" ); 
/x rest of processing */ 


If the command string has to be built from pieces, the core-to-core formatting capabilities of 
printf may be useful (in the portable library only). 


7.2. With Standard UNIX 


If you’re not using the portable C library, or if you need finer control over what happens, 
vou will have to construct calls to other programs using the more primitive routines that the 
portable library's system routine is based on. For no good reason, the standard library doesn’! 
have an equivalent of system. 


First, you can execute another program without returning, by using the routine exec}, 
(described under exec in Section II of the manual). To print the date as the last action of a 
running program, you can say 


execi( "/bin/date", "date", O ); 


The first argument to execi is the fle name of the command; you have to know where it’s 
found. The second argument is conventionally the program name (that is, the last component 
of the file name), but this is seidom used except as a place-holder. [f the command takes argu- 
ments, they are strung out after this, and the whole list is followed by a 0 to terminate jt. 


Ps ee 


The exec! call . verlays your program with date, runs it, then exits. More realistically, 
your program might fal’ into two or more phases that communicate only through temporary 
files, like the assembler. Here it is natural to make the second pass simply an exec! call from 
the first. 


The one exception to the statement that your program never gets control back occurs 
when there is an error, for example if the file can’t be found or is not executable. If you don’t 
know where date is located, say 


execi( "/bin/date", “date”, O ); 
execi(."/usr/bin/date", "date", O ); 
printf( "Someone stole ‘date’\n", 0 ); 


Another version of exec, execv, is useful when you don’t know in advance how many ar- 
guments there are going to be. The call is 


execv( filename, argp ); 


where argp is a list of pointers to the arguments; the last pointer must be followed by a0 so 
execv can tell where the list ends. As with execl, filename is the file in which the program is 
found, and argp[O] is the name of the program. 


Neither of these routines provides the niceties of normal command execution. There is 
no automatic search of multiple directories— you have to know precisely where the command 
is located. Nor do vou get the expansion of metacharacters like ‘<’, ‘>’, ‘*’, ‘2’, and ‘{ ]’ in 
the argument list. If you want these, use exec! to invoke the shell sh, which then does ail the 
work. Construct a string that contains the compiete command you would have typed at the 
terminal, then sav 


execi( "/bin/sh", “sh", "—c", commandline, 0 ): 


The shell is alwavs going to be at a fixed place, /bin/sh. Its argument ‘“‘—c”’ says to treat the 
next argument as a whole command line, so it does just what you want. The only problem is 
in constructing the right information in commandline. 


7.3. Regaining Control 


So far what we've talked about isn’t really all that useful by itself. Now we show how to 
regain control after running a program with execl or execv. Since these routines simpiy over- 
lay the new program on the old one, to save the old one requires that it first be split into two 
copies; one of these can be overlaid, while the other waits for the new, overlaying program 10 
finish. The splitting is done by the routine called fork: 


pid = fork( ); 
Splits the program into two copies, both of which continue to run. The only difference 
between the two is the value of pid. In one of these processes (the “‘child”), pid is zero: in 


the other (ithe “tparent’”’), pid is non-zero; it is the process name of the child. Thus the basic 
way to Call, and return from, another program is 


if ( (fork( ) == 0 ) 
execi( "/bin/sh", "sh", "—c", command, 0 ); 


And in fact, except for handling errors, this is sufficient. The fork makes two coptes of your 
program. In the child, the value returned by fork is zero, so it calls exec! which does the 
command and then dies. In the parent, fork returns non-zero so it skips the execl. 


More often, the parent wants to wait for the child to terminate, so output doesn't get 
scrambled. This can be done with 


i 


if ( fork() == 0) 
execi( ... ); 
wait( &status ): 


This still doesn’t handle any abnormal conditions, such as a failure of the exec! or fork, or the 
possibility that there might be more than one child running simultaneously. (The wait returns 
the pid of the terminated child, if you want to check it against the value returned by fork.) Fi- 
nally, this fragment doesn’, deal with any funny behavior on the part of the child (which is re- 
ported in status). Suill, these three lines are the heart of the portable library’s system routine. 


The status return word set by wait encodes in its low-order byte the system’s idea of the 
child’s termination status; it is 0 for OK and non-zero to indicate various kinds of problems 
like those mentioned in section 8. The high-order byte is taken from the argument of the cail 
to (clexit which caused a normal termination of the child process. At the moment, the stan- 
dard command interpreter (the shell) isn’t fussy about termination status, but it is good coding 
practice for all programs io return meaningful status; someday, they may be called by another 
program which cares whether they worked right. 


When your program is called by the shell, the three file descriptors 0, 1, and 2 are set up 
pointing at the right files, and all other possible file descriptors are available for use. When 
vou call another program, correct etiquette suggests making sure the same conditions hold. 
Neither of the exec calls affects open files in any way. Remember too, that both fork and 
exec create processes whose address space is distinct from that of their caller. Some buffer- 
flushing may be needed before using these cails. Conversely, if a caller buffers an input 
stream, the callee will lose the read-ahead information. (Essentially the same syndrome was 
discussed in §6.1.) 


8. Signals — Interrupts and all that 


This section is concerned with how you can make your program deal gracefully with sig- 
nals from the outside world (like interrupts), and with program faults. Since there’s nothing 
verv useful that can be done from within C about program faults, which arise mainly from ille- 
gal memory references or from execution of peculiar instructions, we'll discuss only the 
outside-world signals: interrupt, which is sent when the DEL character is typed; quit, generated 
by the FS character; and hangup, caused by hanging up the phone. When one of these events 
occurs, the signal is sent to all processes which were started from the corresponding typewriter; 
unless other arrangements have been made, the signal terminates the process. In the quit case, 
a core image file is written, usually for debugging purposes. 


The routine which alters the default action is signal, described in section I] of {1]. It has 
\wo arguments: the first names the signal, the second specifies how to treat it. If the second 
argument is I, the signal is ignored; if it is 0, the default action is restored. Thus 


#define SIGINT 2 


signal(SIGINT, 1); 
ignores interrupis, while 
signal(SIGINT, Q); 


restores the default action of process termination. Such coding is seldom needed (though see 
below) because there is a command which runs another program with these three signals ig- 
nored: 


nohup program & 


runs program (with arguments if you like) in such a way that you can hang up on it without 
fear. Usually vou would follow the command with an “&’’; otherwise your terminal will be 


~{Ss 


firmly tied up. If the command is going to run for a long time, vou might also use nice: 
nice nohup program & 


Nice lowers the priority of program so it won't hog the machine. Incidentally, starting a com- 
mand with the “&" automatically causes interrupts and quits to be ignored, so vou can com- 
pute in the background, and edit and debug in the foreground without danger. However, 
hangups will sull terminate ““&’’ programs. 


Finally, ihe second argument to signal may be the name of a function (which, inciden- 
tally, has to be declared explicitly if the compiler hasn’t seen it already). In this case, the 
named routine will be called when the signal occurs. Most commonly this facility is used to al- 
low the program to clean up unfinished business before terminating, for example to delete a 
temporary file: 


main( ) 


extern int onintr( ): 


if ( (signal(SIGINT, 1) & 1) == 0) 
signal( SIGINT, onintr ); 
/* Process ... #/ 


onintr( ) 

{ 
unlink( temofile ); 
exit( 100 ); 

} 


Why the test and the double call to signal? It’s quite simple: suppose this “interactive” 
program were run non-interactively, say under ““&’’ with input from a file. If it began by an- 
nouncing that all. interrupts were to be sent to the onintr routine, that would in fact occur, 
even if the user meant to interrupt only some foreground process he happened to be running. 
The code as written depends on the fact that signal returns the previous value of its argument; 
the if clause asks whether interrupt was previously being ignored (value is odd, e.g. 1) and only 
if not does it request the call to onintr. In other words, if interrupts were being ignored when 
the program was called, they should still be ignored. 


A more sophisticated program may wish to intercept an interrupt and interpret it as a re- 
quest 10 stop what it is doing and return to its own command-processing loop. Think of the 
editor; interrupting a long printout should not cause it to terminate and lose the work aiready 
done. The outline of the code for this case is probably best written like this: 


main( ) { 
extern int onintrup( ); 


setexit( ); 

if ( (signal(SIGINT, 1) & 1) ==0) 
signal( SIGINT, onintrup ); 

for(;;) { 

/x main processing loop «/ 
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onintrup( ) 


printf( "\ninterrupt\n" ); 
reset( ); 


When an interrupt occurs,’a call is forced to the onintrup routine, which can print a message 
(and perhaps set flags, etc.). reset is a non-local goto to the location after the last call to 
setexit, so control (and the stack level) will pop back to the place in the main routine where 
the signal is set up and the main loop entered. Notice, by the way, that signal gets cailed 
again afler an interrupt occurs. This is necessary; most signals are automatically reset to their 
default action when thev occur. 


Some programs which want to detect signals simply can’t be stopped at an arbitrary point, 
for example in the middle of updating a linked list. If the routine called on occurrence of a 
Signal sets a flag and then returns instead of calling exit or reset, execution will continue at 
the exact point it was interrupted. The interrupt flag can then be tested at some convenient 
point in the main loop. 


There is one difficulty associated with this approach. Suppose the program is reading the 
lypewriter when the interrupt is sent. The specified routine is duly called; it sets its flag and 
returns. If it were really true, as we said above, that “‘execulion resumes at the exact point it 
was interrupted,” the program would continue reading the typewriter until the user typed 
another line. This behavior might well be confusing, since the user might not know that the 
program is reading; he presumably would prefer to have the signal take effect instantly. The 
method chosen to resolve this difficulty is to terminate the typewriter read when execution 
resumes after the signal, returning an error code which indicates what happened. 


Thus programs which catch and resume execution after signals should be prepared for 
‘errors’ which are caused by interrupted system calls. (The ones to watch out for are reads 
from a typewriter, wait, and sleep.) A program whose onintrup program just sets intflag, resets 
the interrupt signal, and returns, should usually include code like the following when it reads 
the standard input: 


if ( getchar( ) == ‘\0' ) 
if ( intflag ) 
/* interrupt processing «/ 
else 
/x end—of—file or error processing */ 


A final subtlety to keep in mind becomes important when signal-catching is combined 
with execution of other programs. Suppose your program catches interrupts, and also includes 
a method (like “!" in the editor) whereby other programs can be executed. Your code should 
look something like this: ; 


signal( SIGINT, 1); | /# ignore interrupts */ 
if ( fork( ) == 0) 
execi( ... ); 
wait( ... ); 
signal( SIGINT, onintrup ): /* restore interrupts «/ 


Why is this? Again, it’s not obvious but not really difficult. Suppose the program you call 
catches its own interrupts. If you interrupt the subprogram, it will get the signal and return to 
its main loop, and probably read your typewriter. But so also will the calling program pop out 
of its wait for the subprogram, and also read your typewriter. Having two processes reading 
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your typewriter is very unfortunate, since the system figuratively flips a coin to decide who 
should get each line of input. A simple way out is to have the parent program ignore inter- 
rupts until the child is done: 
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1. Introduction 

C is a computer language which offers a rich selection of operators and data types and the abil- 
ity to impose useful structure on both control flow and data. All the basic operations and data 
objects are close to those actually implemented by most real computers, so that a very efficient 
implementation is possible, but the design is not tied to any particular machine and with a little 
care it is possible to write easily portable programs. 


This manual describes the current version of the C language as it exists on the PppP-1]1, 
the Honeywell 6000, the 18M System/370, and the Interdata 8/32. Where differences exist, it 
concentrates on the PDP-11, but tries to point out implementation-dependent details. With few 
exceptions, these dependencies follow directly from the underlying properties of the hardware; 
the various compilers are generally quite compatible. 


2. Lexical conventions . 


Blanks, tabs, newlines, and comments as described below are ignored except as they serve to 
separate tokens. Some space is required to separate otherwise adjacent identifiers, keywords, 
and constants. 


If the input stream has been parsed into. tokens up to a given character, the next token is 
taken to include the longest string of characters which could possibly constitute a token. 


2.1 Comments 


The characters /* introduce a comment, which terminates with the characters */. Comments do 
not nest. 


2.2 Identifiers (Names) 


An identifier is a sequence of letters and digits; the first character must be alphabetic. The 
underscore ‘.” counts as alphabetic. Upper and lower case letters are considered different. On 
the PDP-11, no more than the first eight characters are significant, and only the first seven for 
external identifiers. 


2.3 Keywords 
The following identifiers are reserved for use as keywords, and may not be used otherwise: 


Warning: The data type name short is not recognized by the version 
of the C compiler that is distributed as part of PWB/UNIX Edition 1.0. 
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int extern else 
char register for 
float typedef do 
double static while 
struct goto switch 
union return case 
long sizeof default 
short break entry 
unsigned continue 

auto if 


The entry keyword is not currently implemented by any compiler but is reserved for future 
use. Some implementations also reserve the word fortran. 


2.4 Constants 
There are several kinds of constants, as follows: 


2.4.1 Integer constants 


An integer constant consisting of a sequence of digits is taken to be octal if it begins with 0 
(digit zero), decimal otherwise. The digits 8 and 9 have octal value 10 and 11 respectively. A 
sequence of digits preceded by Ox or OX (digit zero) is taken to be a hexadecimal integer. The 
hexadecimal digits include a or A through f or F with values 10 through 15. A decimal con- 
stant whose value exceeds the largest signed machine integer (32767 on the PDp-11) is taken to 
be long; an octal or hex constant which exceeds the largest unsigned machine integer (0177777 
or OxFFFF on the PpP-11) is likewise taken to be long. 


2.4.2 Explicit long constants 


A decimal, octal, or hexadecimal integer constant immediately followed by | {letter ell) or L is a 
long constant, which, on the PDP-11, has 32 significant bits. As discussed below, on other 
machines integer and long values may be considered identical. 


2.4.3 Character constants 


A character constant is a sequence of characters enclosed in single quotes ‘° ’. Within a charac- 
ter constant a single quote must be preceded by a backslash ‘\’. Certain non-graphic characters, 
and ‘\” itself, may be escaped according to the following table: 
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BS \b 
NL (LF) \n 
CR \r 
HT \t 
FF \f 
ddd \ ddd 
\ \\ 


The escape ‘\ddd’ consists of the backslash followed by 1, 2, or 3 octal digits which are taken to 
specify the value of the desired character. A special case of this construction is ‘\0’ (not fol- 
lowed by a digit) which indicates the character NUL. If the character following a backslash is not 
one of those specified, the backslash vanishes. 


The value of a single-character constant is the numerical value of the character in the 
machine’s character set (ASCII for the PDP-11). On the ppp-11 at most two characters are per- 
mitted in a character constant and the second character of a pair is stored in the high-order byte 
of the integer value. Character constants with more than one character are inherently 
machine-dependent and should be avoided. 


2.4.4 Floating constants 


A floating constant consists of an integer part, a decimal point, a fraction part, an e or E, and 
an optionally signed integer exponent. The integer anc fraction parts both consist of a 
sequence of digits. Either the integer part or the fraction part (not both) may be missing; 
either the decimal point or the e and the exponent (not both) may be missing. Every floating 
constant is taken to be double-precision. 


2.5 Strings 


A string is a sequence of characters surrounded by double quotes A string has type ‘array 
of characters’ and storage class ‘static’ (see below) and is initialized with the given characters. 
The compiler places a null byte ‘\0’ at the end of each string so that programs which scan the 
string can find its end. In a string, the character ‘"’ must be preceded by a ‘\’; in addition, the 
same escapes as described for character constants may be used. Finally, a ‘\’ and an immedi- 
ately following new-line are ignored. 


All strings, even when written identically, are distinct. 
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3. Syntax notation 


In the syntax notation used in this manual, syntactic categories are indicated by italic type, and 
literal words and characters in sans-serif type. Alternatives are listed on separate lines. An 
optional terminal or non-terminal symbol is indicated by the subscript ‘opt,’ so that 


{ expression,,, } 


would indicate an optional expression in braces. The complete syntax is given in 316, in the 
notation of YACC. 


4. What’s in a Name? 


C bases the interpretation of an identifier upon two attributes of the identifier: its storage class 
and its type. The storage class determines the location and lifetime of the storage associated 
with an identifier; the type determines the meaning of the values found in the identifier’s 
storage. 


There are four declarable storage classes: automatic, static, external, and register. 
Automatic variables are local to each invocation of a block, and are discarded upon exit from 
the block; static variables are local to a block, but retain their values upon reentry to a block 
even after control has left the block; external variables exist and retain their values throughout 
the execution of the entire program, and may be used for communication between functions, 
even separately compiled functions. Register variables are (if possible) stored in the fast regis- 
ters of the machine; like automatic variables they are local to each block and disappear on exit 
from the block. 


C supports several fundamental types of objects: 


Objects declared as characters (char) are large enough to store any member of the 
implementation’s character set, and if a genuine character is stored in a character variable, its 
value is equivalent to the integer code for that character. Other quantities may be stored into 
character variables, but the implementation is machine-dependent. On the pDp-11, characters 
are stored as signed 8-bit integers, and the character set is ASCII. 


Up to three sizes of integer, declared short int, int, and long int are available. Longer 
integers provide no less storage than shorter ones, but the implementation may make either 
short integers, or long integers, or both equivalent to plain integers. ‘Plain’ integers have the 
natural size suggested by the host machine architecture; the other sizes are provided to meet 
special needs. On the ppP-11, short and plain integers are both represented in 16-bit 2’s com- 
plement notation. Long integers are 32-bit 2’s complement. 


Unsigned integers, declared unsigned, obey the laws of arithmetic modulo 2” where n is 
the number of bits in the representation. (16 on the Ppp-11; long and short unsigned quantities 


are not supported.) 


Single precision floating point (float) quantities, on the PDP-11, have magnitude in the 
range approximately 10*78 or 0; their precision is 24 bits or about seven decimal digits. 


Double-precision floating-point (double) quantities on the PDP-11 have the same range as 
floats and a precision of 56 bits or about 17 decimal digits. Some implementations may make 
float and double synonymous. 


Because objects of these types can usefully be interpreted as numbers, they will be 
referred to as arithmetic types. Types char and int of all sizes will collectively be called integral 
types. Float and double will collectively be called floating types. 


Besides the fundamental arithmetic types there is a conceptually infinite class of derived 
types constructed from the fundamental types in the following ways: 


arrays of objects of most types; 

junctions which return objects of a given type; 

pointers to objects of a given type; 

structures containing a sequence of objects of various types; 

unions capable of containing any one of several objects of various types. 
In general these methods of constructing objects can be applied recursively. 


5. Objects and Ivalues 


An object is a manipulatable region of storage; an /value is an expression referring to an object. 
An obvious example of an lvalue expression is an identifier. There are operators which yield 
lvalues: for example, if E is an expression of pointer type, then «E is an lvalue expression refer- 
ring to the object to which E points. The name ‘lvalue’ comes from the assignment expression 
‘E1 = £2’ in which the left operand El must be an lvalue expression. The discussion of each 
operator below indicates whether it expects |value operands and whether it yields an Ivalue. 


6. Conversions 


A number of operators may, depending on their operands, cause coaversion of the value of an 
operand from one type to another. This section explains the result to be expected from such 
conversions. §6.6 summarizes the conversions demanded by most ordinary operators; it will be 
supplemented as required by the discussion of each operator. 


6.1 Characters and integers 


A character or a short integer may be used wherever an integer may be used. In all cases the 
value is converted to an integer. Conversion of a short integer always involves sign extension; 
Short integers are signed quantities. Whether or not sign-extension occurs for characters is 
machine dependent, but it is guaranteed that a member of the standard character set is non- 
negative. On the ppp-11, character variables range in value from —128 to 127; a character con- 
stant specified using an octal escape also suffers sign extension and may appear negative, for 
example ‘’\214"’. 


When a longer integer is converted to a shorter or to a char, it is truncated on the left. 


6.2 Float and double 


All floating arithmetic in C is carried out in double-precision; whenever a float appears in an 
expression it is lengthened to double by zero-padding its fraction. When a double must be 
converted to float, for example by an assignment, the double is rounded before truncation to 
float length. 


6.3 Floating and integral 


Conversions of floating values to integral type tend to be rather machine-dependent. On the 
PDP-11, truncation is towards 0. The result is undefined if ‘he value will not fit in the space 
provided. 


Conversions of integral values to floating type are well behaved. Some loss of precision 
occurs if the destination lacks sufficient bits. 


6.4 Pointers and integers 


An integer or long integer may be added to or subtracted from a pointer; in such a case the first 
is converted as specified in the discussion of the addition operator. 


Two pointers to objects of the same type may be subtracted; in this case the result is con- 
verted to an integer as specified in the discussion of the subtraction operator. 


6.5 Unsigned 

Whenever an unsigned integer and a plain integer are combined, the plain integer is converted 
to unsigned and the result is unsigned. The value (on the PDp-11) is the least unsigned integer 
congruent to the signed integer (modulo 2!°). Because of the 2’s complement notation, this 
conversion is conceptual and there is no actual change in the bit pattern. 


When an unsigned integer is converted to long, the value of the result is the same numer- 
ically as that of the unsigned integer. Thus the conversion amounts to padding with zeros on 
the left. 


6.6 Arithmetic conversions 


A great many operators cause conversions and yield result types in a similar way. This pattern 
will be called the ‘usual arithmetic conversions.’ 


First, any operands of type char or short are converted to int, and any of type float are 
converted to double. 


Then, if either operand is double, the other is converted to double and that is the type of 
the result. 


Otherwise, if either operand is long, the other is converted to long and that is the type of 
the result. 


Otherwise, if either operand is unsigned, the other is converted to unsigned and that ts 
the type of the result. 


Otherwise, both operands must be int, and that is the type of the result. 


7. Expressions 


The precedence of expression operators is the sarne as the order of the major subsections of 
this section (highest precedence first). Thus the expressions referred to as the operands of + 
(§7.4) are those expressions defined in §$§7.1-7.3. Within each subsection, the operators have 
the same precedence. Left- or right-associativity is specified in each subsection for the opera- 
tors discussed therein. The precedence and associativity of all the expression operators is sum- 
marized in the collected grammar. 


Otherwise the order of evaluation of expressions is undefined. In particular the compiler 
considers itself free to compute subexpressions in the order it believes most efficient, even if 
the subexpressions involve side effects. Expressions involving a commutative and associative 
operator may be rearranged arbitrarily, even in the presence of parentheses; to force a particular 
order of evaluation an explicit temporary must be used. 


7.1 Primary expressions 
Primary expressions involving ., —~>, subscripting, and function calls group left to right. 
primary-expression: 

identifier 
constant 
siring 
( expression ) 
primary-expression ( expression | 
primary-expression { expression-list,,, ) 
primary-lvalue . identifier 
primary-expression —> identifier 


expression-list: 
expression 
expression-list , expression 


An identifier is a primary expression, provided it has been suitably declared as discussed below. 
Its type is specified by its declaration. However, if the type of the identifier is ‘array of ...’, 
then the value of the identifier-expression is a pointer to the first object in the array, and the 
type of the expression is ‘pointer to ..... Moreover, an array identifier is not an lvalue expres- 
sion. Likewise, an identifier which is declared ‘function returning ...’, when used except in the 
function-name position of a call, is converted to ‘pointer to function returning ...’. 


A decimal, octal, character, or floating constant is a primary expression. Its type may be 
int, long, or double depending on its form. 


A string is a primary expression. Its type is originally ‘array of char’; but following the 
same rule given above for identifiers, this is modified to ‘pointer to char’ and the result is a 
pointer to the first character in the string. (There is an exception in certain initializers, see 
§8.6.) 


A parenthesized expression is a primary expression whose type and value are identical to 
those of the unadorned expression. The presence of parentheses does not affect whether the 
expression is an lvalue. 


A primary expression followed by an expression in square brackets is a primary expres- 
sion. The intuitive meaning is that of a subscript. Usually, the primary expression has type 
‘pointer to ...’, the subscript expression is int, and the type of the result is ‘...’. The expres- 
sion ‘E1{E2]’ is identical (by definition) to ‘*((E1)+(E2))’. All the clues needed to 
understand this notation are contained in this section together with the discussions in §§ 7.1, 
7.2, and 7.4 on identifiers, *, and + respectively, §14.3 below summarizes the implications. 


A function call is a primary expression followed by parentheses containing a possibly 
empty, comma-separated list of expressions which constitute the actual arguments to the func- 
tion. The primary expression must be of type ‘function returning ...’, and the result of the 
function call is of type ‘...’. As indicated below, a hitherto unseen identifier followed immedi- 
ately by a left parenthesis is contextually deciared to represent a function returning an integer; 
thus in the most common case, integer-valued functions need not be declared. 


Any actual arguments of type float are converted to double before the call; any of type 
char or short are converted to int. 


In preparing for the call to a function, a copy is made of each actual parameter; thus, all 
argument-passing in C is strictly by value. A function may change the values of its formal 
parameters, but these changes cannot affect the values of the actual parameters. On the other 
hand, it is possible to pass a pointer on the understanding that the function may change the 
value of the object to which the pointer points. The order of evaluation of arguments is 
undefined by the language; take note that the various compilers differ. 


Recursive calls to any function are permitted. 
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A primary expression followed by a dot followed by an identifier is an expression. The 
first expression must be an Ivalue naming a structure or union, and the identifier must name a 
member of the structure or union. The result is an lvalue referring to the named member of 
the structure or union. 


A primary expression followed by an arrow (built from a ‘~’ and a ‘>’) followed by an 
identifier is an expression. The first expression must be a pointer to a structure or a union and 
the identifier must name a member of that structure or union. The result is an lvalue referring 
to the named member of the structure or union to which the pointer expression points. 


Thus the expression "E1—> MOS’ is the same as ‘(*E1).MOS’. Structures and unions are 
discussed in §8.5. The rules given here for the use of structures and unions are not enforced 
strictly, in order’to allow an escape from the typing mechanism. See $14.1. 


7.2 Unary operators “* 
Expressions with unary operators group right-to-left. 


unary-expression. 
* expression 
& lyvalue 
— expression 
! expression 
~ expression 
++ /value 
~— fvalue 
value ++ 
lvalue -— 
( type-name ) expression 
sizeof expression 
sizeof ( npe-name ) 


The unary * operator means indirection: the expression must be a pointer, and the result is an 
lvalue referring to the object to which the expression points. If the type of the expression is 
‘pointer to ...’, the type of the result is ‘...’. 


The resuit of the unary & operator is a pointer to the object referred to by the Ivalue. If 
the type of the Ivalue is ‘...’, the type of the result is ‘pointer to...’. 


The result of the unary — operator is the negative of its operand. The usual arithmetic 
conversions are performed. The negative of an unsigned quantity is computed by subtracting 
its value from 2”, where 7 is 16 on the Ppp-11. 


The result of the logical negation operator ! is | if the value of its operand is 0, 0 if the 
value of its operand is non-zero. The type of the result is int. It is applicable to any arithmetic 
type or to pointers. 


The ~ operator yields the one’s complement of its operand. The usual arithmetic conver- 
sions are performed. The type of the operand must be integral. 


The object referred to by the lvalue operand of prefix “+ +’ is incremented. The value is 
the new value of the operand, but is not an lvalue. The expression ‘+ +a’ is equivalent to 
‘(a += 1)’. See the discussions of addition ($7.4) and assignment operators (37.14) for infor- 
mation on conversions. 


The lvalue operand of prefix ‘——’ is decremented analogously to the ++ operator. 


When postfix ‘++’ is applied to an Ivalue the result is the value of the object referred to 
by the Ivalue. After the result is noted, the object is incremented in the same manner as for 
the prefix ++ operator. The type of the result is the same as the type of the Ivalue expres- 
sion. 


When postfix ‘——’ is applied to an Ivalue the result is the value of the object referred to 
by the lvalue. After the result is noted, the object is decremented in the manner as for the 
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prefix -— operator. The type of the result is the same as the type of the Ivalue expression. 


An expression preceded by the parenthesized name of a data type causes conversion of 
the value of the expression to the named type. The construction of type names is described in 
§8.7. 


The sizeof operator yields the size, in bytes, of its operand. (A dyte is undefined by the 
language except in terms of the value of sizeof. However in all existing implementations a 
byte is the space required to hold a char.) When applied to an array, the result is the total 
number of bytes in the array. The size is determined from the declarations of the objects in the 
expression. This expression is semantically an integer constant and may be used anywhere a 
constant is required. Its major use is in communication with routines like storage allocators and 
I/O systems. 


The sizeof operator may also be applied to a parenthesized type name. In that case it 
yields the size, in bytes, of an object of the indicated type. 


The construction ‘sizeof(type)’ is taken to be a unit, so the expression ‘sizeof (type)—2’ is 
the same as ‘(sizeof (type))—2’. 


7.3 Multiplicative operators 


The multiplicative operators +, /, and % group left-to-right. The usual arithmetic conversions 
are performed. 


multiplicative-expression: 
expression * expression 
expression / expression 
expression ‘> expression 


The binary * operator indicates multiplication. The * operator is associative and expressions 
with several multiplications at the same level may be rearranged. 


The binary / operator indicates division. When positive integers are divided truncation is 
toward 0, but the form of truncation is machine-dependent if either operand is negative. In all 
cases it is true that (@/b)*b + a%b = a. On the ppp-11, the remainder has the same sign as 
the dividend. 


The binary % operator yields the remainder from the division of the first expression by 
the second. The usual arithmetic conversions are performed. On the ppP-11, the remainder 
has the same sign as the dividend. The operands must not be floating. 


7.4 Additive operators 


The additive operators + and — group left-to-right. The usual arithmetic conversions are per- 
formed. There are some additional type possibilities for each operator. 


additive-expression: 
expression + expression 
expression — expression 


The result of the ‘+’ operator is the sum of the operands. A pointer to an object in an array 
and a value of any integral type may be added. The latter is in all cases converted to an address 
offset by multiplying it by the length of the object to which the pointer points. The result is a 
pointer of the same type as the original pointer, and which points to another object in the same 
array, appropriately offset from the original object. Thus if P is a pointer to an object in an 
array, the expression ‘P+1’ is a pointer to the next object in the array. 


No further type combinations are allowed. 


The + operator is associative and expressions with several additions at the same level 
may be rearranged. 


The result of the ‘—’ operator is the difference of the operands. The usual arithmetic 
conversions are performed. Additionally, a value of any integral type may be subtracted from a 
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pointer, and then the same conversions as for addition apply. 


If two pointers to objects of the same type are subtracted, the result is converted (by divi- 
sion by the length of the object) to an int representing the number of objects separating the 
pointed-to objects. This conversion will in general give unexpected results unless the pointers 
point to objects in the same array, since pointers, even to objects of the same type, do not 
necessarily differ by a multiple of the object-length. 


7.5 Shift operators 


The shift operators << and >> group left-to-right. Both perform the usual arithmetic 
conversions on their operands, each of which must be integral. Then the right operand is con- 
verted to int; the type of the result is that of the left operand. The result is undefined if the 
right operand is negative or larger than the number of bits in the object. 
shift-expression: 
expression << expression 
expression > > expression 


The value of ‘El<<E2’ is El (interpreted as a bit pattern) left-shifted E2 bits; vacated bits are 
0-filled. The value of ‘El >>E2’ is El right-shifted E2 bit positions. The shift is guaranteed 
to be logical (0-fill) if El is unsigned; otherwise it may be (and is, on the PDP-11) arithmetic 
(fill by a copy of the sign bit). 


7.6 Relational operators 


The relational operators group left-to-right, but this fact is not very useful; ‘a<b<c’ does not 
mean what it seems. to. 


relational-expression: 
expression < expression 
expression > expression 
expression <= expression 
expression > = expression 


The operators < (less than), > (greater than), <= (less than or equal to) and >= (greater 
than or equal to) all yield 0 if the specified relation is false and | if it is true. The type of the 
result is int. The usual arithmetic conversions are performed. Two pointers may be compared, 
and the result depends on the relative locations in the address space of the pointed-to objects. 
Pointer comparison is portable only when the pointers point to objects in the same array. 


7.7 Equality operators 


equality-expression: 
expression = = expression 
expression |= expression 
The == (equal to) and the != (not equal to) operators are exactly analogous to the relational 
operators except for their lower precedence. (Thus ‘a<b == c<d’ is | whenever a<b and 
c<d have the same truth-value). 

A pointer may be compared to an integer, but the result is machine dependent unless the 
integer is the constant 0. A pointer to which 0 has been assigned is guaranteed not to point to 
any object, and will appear to be equal to 0; in conventional usage, such a pointer is considered 
to be null. 


7.8 Bitwise and operator 


and-expression: 
expression & expression 
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The & operator is associative and expressions involving & may be rearranged. The usual arith- 
metic conversions are performed; the result is the bit-wise ‘and’ function of the operands. The 
Operator applies only to integral operands. 


7.9 Bitwise exclusive or operator 


exclusive-or-expression: 
expression ~ expression 


The ~ operator is associative and expressions involving ~ may be rearranged. The usual arith- 
metic conversions are performed; the result is is the bit-wise ‘exclusive or’ function of the 
operands. The operator applies only to integral operands. 


7.10 Bitwise inciusive or operator 


inclusive-or-expression: 
expression | expression 
The | operator is associative and expressions with | may be rearranged. The usual arithmetic 
conversions are performed; the result is the bit-wise ‘inclusive or’ function of its operands. 
The operator applies only to integral operands. 


7.11 Logical and operator 


logical-and-expression: 
expression && expression 


The && operator groups left-to-right. It returns 1 if both its operands are non-zero, 0 other- 
wise. Unlike &, && guarantees left-to-right evaluation; moreover the second operand is not 
evaluated if the first operand is 0. 


The operands need not have the same type, but each must have one of the fundamental 
types or be a pointer. The result is always int. 


7.12 Logical or operator 


logical-or-expression: 
expression \| expression 


The |! operator groups left-to-right. It returns | if either of its operands is non-zero, and 0 oth- 
erwise. Unlike |, Il guarantees left-to-right evaluation; moreover, the second operand is not 
evaluated if the value of the first operand is non-zero. 


The operands need not have the same type, but each must have one of the fundamental 
types or be a pointer. The result is always int. 


7.13 Conditional operator 


conditional-expression: 
expression ? expression : expression 


Conditional expressions group right-to-left. The first expression is evaluated and if it is non- 
zero, the result ts the value of the second expression, otherwise that of third expression. If 
possible, the usual arithmetic conversions are performed to bring the second and third expres- 
sions to a common type; otherwise, if both are pointers of the same type, the result has the 
common type; otherwise, one must be a pointer and the other the constant 0, and the result 
has the type of the pointer. Only one of the second and third expressions is evaluated. 
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7.14 Assignment operators 


There are a number of assignment operators, all of which group right-to-left. All require an 
lvalue as their left operand, and the type of an assignment e oression is that of its left operand. 
The value is the value stored in the left operand after the assignment has taken place. The two 
parts of a compound assignment operator are separate tokens. 


assignment-expression: 
lvalue = expression 
lvalue + = expression 
lvalue - = expression 
lyalue * = expression 
lvalue / = expression 
lvalue % = expression 
lyalue >> = expression 
lvalue <<< = expression 
lvalue & = expression 
fyalue ~ = expression 
lvalue | == expression 


Notice that the representation of the compound assignment operators has changed; formerly the 
‘=’ came first and the other operator came second (without any space). The compiler contin- 
ues to accept the previous notation. 


In the simple assignment with ‘=’, the value of the expression replaces that of the object 
referred to by the Ivalue. If both operands have arithmetic type, the right operand is converted 
to the type of the left preparatory to the assignment. 


The behavior of an expression of the form ‘El op = E2’ may be inferred by taking it as 
equivalent to ‘El = El op (E2)’; however, El is evaluated only once. In += and —=, the 
left operand may be a pointer, in which case the (integral) right operand is converted as 
explained in §7.4; all right operands and all non-pointer left operands must have arithmetic 
type. 

The compiler currently allows a pointer to be assigned to an integer, an integer to a 
pointer, and a pointer to a pointer of another type. The assignment is a pure copy operation, 
with no conversion. This usage is nonportable, and may produce pointers which cause address- 
ing exceptions when used. However, it is guaranteed that assignment of the constant 0 to a 
pointer will produce a null pointer distinguishable from a pointer to any object. 


7.15 Comma operator 


comma-expression: 
expression , expression 


A pair of expressions separated by a comma is evaluated left-to-right and the value of the left 
expression is discarded. The type and value of the result are the type and value of the right 
operand. This operator groups left-to-right. In contexts where comma is given a special mean- 
ing, for example in a list of actual arguments to functions ($7.1) and lists of initializers ($8.6), 
the comma operator as described in this section can only appear in parentheses; for example, 
‘f(a, (t = 3, t+2), c)” has three arguments, the second of which has the value 5. 


8. Declarations 


Declarations are used within function definitions to specify the interpretation which C gives to 
each identifier, they do not necessarily reserve storage associated with the identifier. Declara- 
tions have the form 


declaration: 
decl-specifiers declarator-list,, ; 
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The deciarators in the declarator-list contain the identifiers being declared. The decl-specifiers 
consist of a sequence of type and storage class specifiers. 


deci-specifiers: 
type-specifier deci-specifiers,,, 
sc-specifier decl-specifiers,,, 


The list must be self-consistent in a way described below. 


8.1 Storage class specifiers 
The sc-specifiers are: 
sc-specifier: 
auto 
static 
extern 
register 
typedef 


The typedef specifier does not reserve storage and is called a ‘storage class specifier’ only for 
syntactic convenience; it is discussed in $8.8. 


The meanings of the various storage classes were discussed in $4. 


The auto, static, and register declarations also serve as definitions in that they cause an 
appropriate amount of storage to be reserved. In the extern case there must be an external 
definition (§10) for the given identifiers somewhere outside the function in which they are 
declared. 


A register declaration is best thought of as an auto declaration, together with a hint to 
the compiler that the variables declared will be heavily used. Only the first few (three, for the 
Ppp-11) such deciarations are effective. Moreover, only variables of certain types will be stored 
in registers; on the PDP-11, they are int, char, or pointer. One restriction applies to register 
variables: the address-of operator & cannot be applied to them. Smailer, faster programs can be 
expected if register declarations are used appropriately, but future developments may render 
them unnecessary. 


At most one sc-specifier may be given in a declaration. If the sc-specifier is missing from 
a declaration, it is taken to be auto inside a function, extern outside. Exception: functions are 
always extern. 


8.2 Type specifiers 
The type-specifiers are 


type-specifier: 
char 
short 
int 
long 
unsigned 
float 
double 
struct-or-union-specifier 
typedef-name 


The words long, short, and unsigned may be thought of as adjectives; the following combina- 
tions are acceptable (in any order). 


oh 


short int 

long int 

unsigned int 

long float 


The meaning of the last is the same as double. Otherwise, at most one type-specifier may be 
given in a declaration. If the type-specifier is missing from a declaration, it is taken to be int. 


Specifiers for structures and unions are discussed in §8.5; declarations with typedef names 
are discussed in $8.8. 


8.3 Declarators 


The declarator-list appearing in a declaration is a comma-separated sequence of declarators, 
each of which may have an initializer. 


deciarator-list: 
init-declarator 
init-declarator , declarator-list 


init-deciarator: 
decilarator initializer, 


Initializers are discussed in §8.6. The specifiers in the declaration indicate the type and storage 
class of the objects to which the declarators refer. Declarators have the syntax: 


deciarator: 
identifier 
( declarator ) 
* deciarator 
declarator { ) 
deciarator [ constant-expression,,, | 


The grouping is the same as in expressions. 


8.4 Meaning of deciarators : 


Each declarator is taken to be an assertion that when a construction of the same form as the 
declarator appears in an expression, it yields an object of the indicated type and storage class. 
Each declarator contains exactly one identifier; it is this identifier that is declared. 


If an unadorned identifier appears as a declarator, then it has the type indicated by the 
specifier heading the declaration. 


A declarator in parentheses is identical to the unadorned declarator, but the binding of 
complex declarators may be altered by parentheses. See the examples below. 


If a declarator has the form 
«D 


for D a deciarator, then the contained identifier has the type ‘pointer to ...’, where ‘...” is the 
type which the identifier would have had if the declarator had been simply D. 


If a declarator has the form 
D() 


then the contained identifier has the type ‘function returning ...’, where ‘...” is the type which 
the identifier would have had if the declarator had been simply D. 


A declarator may have the form 
Diconstant-expression] 


or 
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Df] 


Such declarators make the contained identifier have type ‘array.’ If the unadorned declarator D 
would specify a non-array of type ‘...’, then the declarator ‘D[i]’ yields a 1-dimensional array 
with rank i of objects of type ‘...’. If the unadorned declarator D would specify an n- 
dimensional array with rank /,Xi.x --- xi,, then the declarator D{i,,,] yields an (2+1)- 
dimensional array with rank i,Xi.* +--+ Xi,Xi,4). 


In the first case the constant expression is an expression whose value is determinable at 
compile time, and whose type is int. (Constant expressions are defined precisely in §15.) The 
constant expression of an array declarator may be missing only for the first dimension. This 
notation is useful when the array is external and the actual declaration, which allocates storage, 
is given elsewhere. The constant-expression may also be omitted when the declarator is fol- 
lowed by initialization. In this case the size is calculated from the number of initial elements 
supplied. 


An array may be constructed from one of the basic types, from a pointer, from a structure 
or union, or from another array (to generate a multi-dimensional array). 


Not all the possibilities allowed by the syntax above are actually permitted. The restric- 
tions are as follows: functions may not return arrays, structures or functions, although they may 
return pointers to such things; there are no arrays of functions, although there may be arrays of 
pointers to functions. Likewise a structure may not contain a function, but it may contain a 
pointer to a function. 


As an example, the declaration 
int i, *ip, f(), «fip(), (*pfi) (); 


declares an integer (, a pointer ip to an integer, a function freturning an integer, a function /ip 
returning a pointer to an integer, and a pointer pi to a function which returns an integer. It is 
especially useful to compare the last two. The binding of ‘*fip()’ is ‘*(fip())’, so that the 
declaration suggests, and the same construction in an expression requires, the calling of a func- 
tion fip, and then using indirection through the (pointer) result to yield an integer. In the 
deciarator ‘(+pfi) ()’, the extra parentheses are necessary, as they are also in an expression, to 
indicate that indirection through a pointer to a function yields a function, which is then called. 


As another exampie, 
float fa(1 7], safp(1 7); 
declares an array of float numbers and an array of pointers to float numbers. Finally, 
static int x3d(3][5][7}; 


declares a static three-dimensional array of integers, with rank 3x5x7. In complete detail, x3d 
is an array of three items: each item is an array of five arrays; each of the latter arrays is an 
array of seven integers. Any of the expressions ‘x3d’, ‘x3d[i]’, ‘x3d{i]{j]’, ‘x3dfil(j][k]’ 
may reasonably appear in an expression. The first three have type ‘array’, the last has type int. 


8.5 Structure and union declarations 


A structure is an object consisting of a sequence of named members. Each member may have 
any type. A union is an object which may, at a given time, contain any one of several 
members. Structure and union specifiers have the same form. 


structure-or-union-specifier: 
struct-or-union ( struct-deci-list } 
struct-or-union identifier { struct-decl-list } 
struct-or-union identifier 
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Struct-or-union. 
struct 
union 


The struct-deci-list is a sequence of declarations for the memvers of the structure or union: 


struct-decl-list: 
struct-declaration 
struct-deciaration struct-decl-list 


struct-declaration: 
type-specifier struct-declarator-list ; 


struct-declarator-list: 
struct-declarator 
struct-deciarator , struct-declarator-list 


In the usual case, a struct-declarator is just a declarator for a member of a structure or union. 
A structure member may also consist of a specified number of bits. Such a member is also 
called a field; its length is set off from the field name by a colon. 


struct-declarator: 
deciarator 
declarator : constant-expression 
constant-expression 


Within a structure, the objects declared have addresses which increase as their declarations are 
read left-to-right. Each non-field member of a structure begins on an addressing boundary 
appropriate to its type. On the PDpp-11 the only requirement is that non-characters begin on a 
word boundary; therefore, there may be 1-byte, unnamed holes in a structure. Field members 
are packed into machine integers; they do not straddle words. A field which does not fit into 
the space remaining in a word is put into the next word. No field may be wider than a word. 
On the PDP-11, fields are assigned right-to-left. 


A struct-declarator with no declarator, only a colon and a width, indicates an unnamed 
field useful for padding to conform to externally-imposed layouts. As a special case, an 
unnamed field with a width of 0 specifies alignment of the next field at a word boundary. The 
‘next field’ presumably is a field, not an ordinary structure member, because in the latter case 
the alignment would have been automatic. 


The language does not restrict the types of things that are declared as fields, but imple- 
mentations are not required to support any but integer fields. Moreover, even int fields may be 
considered to be unsigned. On the Ppp-11, fields are not signed and have only integer values. 


A union may be thought of as a structure all of whose members begin at offset 0 and 
whose size is sufficient to contain any of its members. At most one of the members can be 
stored in a union at any time. 


A structure or union specifier of the second form, that is, one of 


struct identifier { struct-decl-list } 

union identifier { struct-decl-list } 
declares the identifier to be the structure tag (or union tag) of the structure specified by the list. 
A subsequent declaration may then use the third form of specifier, one of 


struct identifier 
union identifier 


Structure tags allow definition of self-referential structures; they also permit the long part of the 
declaration to be given once and used several times. It is however absurd to declare a structure 
or union which contains an instance of itself, as distinct from a pointer to an instance of itself. 


The names of members and tags may be the same as ordinary variables. However, names 
of tags and members must be mutually distinct. 


thc 


Two structures may share a common initial sequence of members; that is, the same 
member may appear in two different structures if it has the same type in both and if all previ- 
ous members are the same in both. (Actually, the compiler checks only that a name in two 
different structures has the same type and offset in both, but if preceding members differ the 
construction is nonportable.) 


A simple example of a structure declaration is 


struct tnode ‘{ 
char tword(20]; 
int count; 
struct tnode left; 
struct tnode «right; 


hs 


which contains an array of 20 characters, an integer, and two pointers to simular structures. 
Once this declaration has been given, the following declaration makes sense: 


struct tnode S, «sp; 


which declares s to be a structure of the given sort and sp to be a pointer to a structure of the 
given sort. With these declarations, the expression 


sp—>count 
refers to the count field of the structure to which sp points; 
s.left 
refers to the left subtree pointer of the structure s. Finally, 
s.right— >tword[0] 
refers to the first character of the tword member of the right subtree of s. 


8.6 Initialization 


A declarator may specify an initial value for the identifier being declared. The initializer is pre- 
ceded by ‘=’, and consists of an expression or a list of values nested in braces. 
initializer: 
= expression 
= { initializer-list } 
= ( initializer-list , } 


initializer-list: 
expression 
initializer-list , initializer-list 
{ initializer-list } 
The ‘=’ is a new addition to the syntax, intended to alleviate potential ambiguities. The 


current compiler allows it to be omitted when the rest of the initializer is a very simple expres- 
sion Gust a name, string, or constant) or when the rest of the initializer is enclosed in braces. 


All the expressions in an initializer for a static or external variable must be constant 
expressions, which are described in $15, or expressions which reduce to the address of a previ- 
ously declared variable, possibly offset by a constant expression. Automatic or register vari- 
ables may be initialized by arbitrary expressions involving previously declared variables. 

When an initializer applies to a scalar (a pointer or an object of arithmetic type), it con- 
sists of a single expression, perhaps in braces. The initial value of the object is taken from the 
expression; the same conversions as for assignment are performed. 


When the declared variable is an aggregate (a structure or array) then the initializer con- 
sists of a brace-enclosed, comma-separated list of initializers for the members of the aggregate, 
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written in increasing subscript or member order. If the aggregate contains subaggregates, this 
rule applies recursively to the members of the aggregate. If there are fewer initializers in the 
list than there are members of the aggregate, then the aggregate is padded with 0’s. It is not 
permitted to initialize unions or automatic aggregates. Currcatly, the PDP-11 compiler also for- 
bids initializing fields in structures. 

Braces may be elided as follows. If the initializer begins with a left brace, then the 
succeding comma-separated list of initializers initialize the members of the aggregate; it is 
erroneous for there to be more initializers than members. If, however, the initializer does not 
begin with a left brace, then only enough elements from the list are taken to account for the 
members of the aggregate; any remaining members are left to initialize the next member of the 
aggregate of which the current aggregate is a part. 

A final abbreviation allows a char array to be initialized by a string. In this case succes- 
sive members of the string initialize the members of the array. 


For example, 
int x{] = { 1,3, 5}; 


deciares and initializes x as a l-dimensional array which has three members, since no size was 
specified and there are three initializers. 


float y(4]({3] = { 
(1,3,5 
{2,46 
(3,5,7 
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is a completely-bracketed initialization: 1, 3, and 5 initialize the first row of the array (0, 
namely y(0](0], y(O]{1], and y(0][2]. Likewise the next two lines initialize y{1] and y(2]. The 
initializer ends early and therefore y(3] is initialized with 0. Precisely the same effect could 
have been achieved by 


float y(4]{3] = { 
1; 3, 2S; 2, 4, 6, 3, 5, 7, 
}; 


The initializer for y begins with a left brace, but that for y{0] does not, therefore 3 elements 
from the list are used. Likewise the next three are taken successively for y{1] and y[2]. Also, 


float y({4](3] = { 
(1}](2h(3}{4} 


initializes the first column of y (regarded as a two-dimensional array) and leaves the rest 0. 
Finally, 
char msg{] = "Syntax error on line %s\n"; 


shows a character array whose members are initialized with a string. 


8.7. Type names 


In two contexts (to specify type conversions explicitly, and as an argument of sizeof) it is 
desired to supply the name of a data type. This is accomplished using a ‘type name,’ which in 
essence is a declaration for an object of that type which omits the name of the object. 


type-name: 
type-specifier abstract-declarator 
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abstract-declarator: 
emply 
( abstract-deciarator ) 
« abstract-decilarator 
abstract-declarator ( ) 
abstract-declarator | constant-expression,, ] 


To avoid ambiguity, in the construction 

( abstract-deciarator ) 
the abstract-deciarator is required to be nonempty. Under this restriction, it is possible to iden- 
tify uniquely the location in the abstract-declarator where the identifier would appear if the con- 
struction were a declarator in a declaration. The named type is then the same as the type of the 
hypothetical identifier. For example, 

int 

int « 

int +(3] 

int (=)[3] 


name respectively the types ‘integer,’ ‘pointer to integer,’ ‘array of 3 pointers to integers,’ and 
‘pointer to an array of 3 integers.’ As another example, 


int i; 
sin{ (double) i): 


calls the sin routine (which accepts a double argument) with an argument appropriately con- 
verted. 


8.8 Typedef 


Declarations whose ‘storage class’ is typedef do not define storage, but instead define 
identifiers which can be used later as if they were type keywords naming fundamental or 
derived types. Within the scope of a declaration involving typedef, each of the identifiers 
appearing as part of any declarators therein become syntactically equivalent to type keywords 
naming the type associated with the identifiers in the way described in §8.4. 


typedef-name: 
identifier 
For example, after 


typedef int MILES, *KLICKSP; 
typedef struct ( double re, im;} complex; 


the constructions 


MILES distance; 
extern KLICKSP metricp; 
complex Zz, «zp; 


are all legal declarations; the type of distance is ‘int’, that of mezricp is ‘pointer to int,’ and that 
of z is the specified structure. Zp is a pointer to such a structure. 


Typedef does not introduce brand new types, only synonyms for types which could be 


specified in another way. Thus in the example above distance is considered to have exactly the 
same type as any other int variable. 
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9. Statements 
Except as indicated, statements are executed in sequence. 


9.1 Expression statement 

Most statements are expression statements, which have the form 
eXpression ; 

Usually expression statements are assignments or function calls. 


9.2 Compound statement, or block 
So that several statements can be used where one is expected, the compound statement (also, 
and equivalently, called ‘block’) is provided: 
compound-statement: 
{ declaration-list,,, statement-list,,, } 


opt 


deciaration-list: 
declaration 
declaration declaration-list 


statement-list: 
statement 
statement statement-list 


If any of the identifiers in the declaration-list were previously declared, the outer declaration is 
pushed down for the duration of the block, at which time it resumes its force. 


Any initializations of auto or register variables are performed each time the block is 
entered at the top. It is currently possible (but a bad practice) to transfer into a block; in that 
case the initializations are not performed. Initializations of static variables are performed only 
once when the program begins execution. Inside a block, external declarations do not reserve 
storage so initialization is not permitted. 


9.3 Conditional statement 
The two forms of the conditional statement are 


if ( expression ) statement 
if ( expression) statement else statement 


In both cases the expression is evaluated and if it is non-zero, the first substaternent is exe- 
cuted. In the second case the second substatement is executed if the expression is 0. As usual 
the ‘else’ ambiguity is resolved by connecting an eiSe with the last encountered elseless if. 


9.4 While statement 
The while statement has the form 
while ( expression ) statement 


The substatement is executed repeatedly so long as the value of the expression remains non- 
zero. The test takes place before each execution of the statement. 


9.5 Do statement 
The do statement has the form 
do statement while ( expression ) ; 


The substatement is executed repeatedly until the value of the expression becomes zero. The 
test takes place after each execution of the statement. 
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9.6 For statement 
The for statement has the form 
for ( expression-1,,; expression-2,, ; expression-3,,,) statement 
This statement is equivalent to 
expression- 1; 
while { expression-2) { 
statement 
expression-3 ; 


Thus the first expression specifies initialization for the loop; the second specifies a test, made 
before each iteration, such that the loop is exited when the expression becomes 0; the third 
expression typically specifies an incrementation which is performed after each iteration. 


Any or all of the expressions may be dropped. A missing expression-2 makes the implied 
while clause equivalent to ‘while(1)°; other missing expressions are simply dropped from the 
expansion above. 


9.7 Switch statement 


The switch statement causes control to be transferred to one of several statements depending 
on the value of an expression. It has the form 


switch ( expression) statement 


The usual arithmetic conversion is performed on the expression, but the result must be int. 
The statement is typically compound. Any statement within the statement may be labelled with 
one or more case prefixes as follows: 


Case constant-expression : 


where the constant expression must be int. No two of the case constants in the same switch 
may have the same value. Constant expressions are precisely defined in $15. 


There may also be at most one statement prefix of the form 
default : 


When the switch statement is executed, its expression is evaluated and compared with each 
case constant. If one of the case constants is equal to the value of the expression, control is 
passed to the statement following the matched case prefix. If no case constant matches the 
expression, and if there is a default prefix, control passes to the prefixed statement. If no case 
matches and if there is no default then none of the statements in the switch is executed. 


Case and default prefixes in themselves do not alter the flow of control, which continues 
unimpeded across such prefixes. To exit from a switch, see break, $9.8. 


Usually the statement that is the subject of a switch is compound. Declarations may 
appear at the head of this statement, initializations of automatic or register variables are 
ineffective. 


9.8 Break statement 
The statement 
break ; 


causes termination of the smallest enclosing while, do, for, or switch statement; control passes 
to the statement following the terminated statement. 
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9.9 Continue statement 
The statement 
continue ; 


causes control to pass to the loop-continuation portion of the smallest enclosing while, do, or 
for statement; that is to the end of the loop. More precisely, in each of the statements 


while (...) { do { for (...) { 
contin: : contin: ; contin: : 
} while (...); 


a continue is equivalent to ‘goto contin’. (Following the ‘contin:’ is a null statement, §9.13.) 


9.10 Return statement 
A function returns to its caller by means of the return statement, which has one of the forms 
return ; 
. return expression ; 


In the first case the returned value is undefined. In the second case, the value of the expres- 
sion is returned to the caller of the function. If required, the expression is converted, as if by 
assignment, to the type of the function in which it appears. Flowing off the end of a function is 
equivalent to a return with no returned value. 


9.11 Goto statement 
Control may be transferred unconditionally by means of the statement 
goto identifier ; 


The identifier must be a label ($9.12) located in the current function. Previous versions of C 
had an incompletely implemented notion of label variable, which has been withdrawn. 


9.12 Labelled statement 
Any statement may be preceded by label prefixes of the form 
identifier : 
which serve to declare the identifier as a label. The only use of a label is as a target of a goto. 


The scope of a label is the current function, excluding any sub-blocks in which the same 
identifier has been redeclared. See $11. 


9.13 Null statement 

The null statement has the form 

A null statement is useful to carry a label just before the ‘}’ of a compound statement or to 
supply a null body to a looping statement such as while. 


10. External definitions 


A C program consists of a sequence of external definitions. An external definition declares an 
identifier to have storage class extern (by default) or perhaps static, and a specified type. The 
type-specifier (98.2) may also be empty, in which case the type is taken to be int. The scope of 
external definitions persists to the end of the file in which they are declared just as the effect of 
deciarations persists to the end of a block. The syntax of external definitions is the same as 
that of all declarations, except that only at this levet may the code for functions be given. 
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10.1 External function definitions 
Function definitions have the form 
function-definition: 
decl-specifiers,, function-declarator function-body 
The only sc-specifiers allowed among the decl-specifiers are extern or static; See §11.2 for the 
distinction between them. A function declarator is similar to a declarator for a ‘function return- 
ing ...’ except that it lists the formal parameters of the function being defined. 


function-decilarator: 
declarator ( parameter- list.) 


parameter-list: 
identifier 
identifier , parameter-list 


The function-body has the form 
function-body: 
deciaration-list compound-statement 
The identifiers in the parameter list, and only those identifiers, may be declared in the deciara- 
tion list. Any identifiers whose type is not given are taken to be int. The only storage class 
which may be specified is register; if it is specified, the corresponding actual parameter will be 
copied, if possible, into a register at the outset of the function. 


A simple example of a complete function definition is 


int max (a, b, ¢) 
int a, b, c; 


int m; 
m= (a>b)? a:b: 
return(m>c? m:c); 


Here ‘int’ is the type-specifier, ‘max(a, b, c)’ is the function-declarator; ‘int a, b, c;’ is the 
decilaration-list for the formal parameters; ‘{...}’ is the block giving the code for the state- 
ment. The parentheses in the return are not required. 


C converts all float actual parameters to double, so formal parameters declared float have 
their declaration adjusted to read double. Also, since a reference to an array in any context (in 
particular as an actual parameter) is taken to mean a pointer to the first element of the array, 
declarations of formal parameters declared ‘array of ...’ are adjusted to read ‘pointer to ...’. 
Finally, because neither structures nor functions can be passed to a function, it is useless to 
declare a formal parameter to be a structure or function (pointers to structures or functions are 


of course permitted). 


A free return statement is supplied at the end of each function definition, so running off 
the end causes control, but no value, to be returned to the caller. 


10.2 External data definitions 
An external data definition has the form 
data-definition: 
declaration 


The storage class of such data may be extern (which is the default) or static, but not auto or 
register. 
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1. Scope rules 


A C program need not all be compiled at the same time: the source text of the program may be 
kept in several files, and precompiled routines may be load 1 trom libraries. Communication 
among the functions of a program may be carried out both through explicit calls and through 
manipulation of external data. 


Therefore, there are two kinds of scope to consider: first, what may be called the /exical 
scope of an identifier, which is essentially the region of a program during which it may be used 
without drawing ‘undefined identifier’ diagnostics; and second, the scope associated with exter- 
nal identifiers, which is characterized by the rule that references to the same external identifier 
are references to the same object. 


11.1 Lexical scope 


The lexical scope of identifiers declared in external definitions persists from the definition 
through the end of the file in which they appear. The lexical scope of identifiers which are for- 
mal parameters persists through the function with which they are associated. The lexical scope 
of identifiers declared at the head of blocks persists until the end of ie block. The lexical 
scope of labels is the whole of the function in which they appear. 


Because all references to the same external identifier refer to the same object (see $11.2) 
the compiler checks all declarations of the same external identifier for compatibility; in effect 
their scope is increased to the whole file in which they appear. 


In all cases, however, if an identifier is explicitly declared at the head of a block, including 
the block constituting a function, any declaration of that identifier outside the block is 
suspended until the end of the block. 


Remember also (§8.5) that identifiers associated with ordinary variables on the one hand 
and those associated with structure and union members and tags on the other form two disjoint 
classes which do not conflict. Typedef names are in the same class as ordinary identifiers. 
They may be redeciared in inner blocks, but an explicit type must be given in the inner declara- 
tion: 


typedef float distance; 


{ auto int distance; 


The int must be present in the second declaration, or it would be taken to be a declaration with 
no declarators and type distance." 


11.2 Scope of externals 


If a function declares an identifier to be extern, then somewhere among the files or libraries 
constituting the complete program there must be an external definition for the identifier. All 
functions in a given program which refer to the same external identifier refer to the same 
object, so care must be taken that the type and extent specified in the definition are compatible 
with those specified by each function which references the data. 


In ppp-11 C, compatible external definitions of the same identifier may be present in 
several of the separately-compiled pieces of a complete program, or even twice within the same 
program file, with the limitation that the identifier may be initialized in at most one of the 
definitions. In other operating systems, however, the compiler must know in just which file the 
storage for the identifier is allocated, and in which file the identifier is merely being referred to. 
The appearance of the extern keyword in an external definition indicates that storage for the 
identifiers being declared will be allocated in another file. Thus in a multi-file program, an 


*It is agreed that the ice is thin here. 
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external data definition without the extern specifier must appear in exactly one of the files. 
Any other files which wish to give an external definition for the identifier must include the 
extern in the definition. The identifier can be initialized only in the declaration where storage 
is allocated. 


Identifiers declared static at the top level in external definitions are not visible in other 
files. 


12. Compiler control lines 


The C compiler contains a preprocessor capable of macro substitution, conditional compilation, 
and inclusion of named files. Lines beginning with ‘#’ communicate with this preprocessor. 
These lines have syntax independent of the rest of the language; they may appear anywhere and 
have effect which lasts (independent of scope) until the end of the source program file. 


12.1 Token replacement , 
A compiler-control line of the form 
# define identifier token-string 


(note: no trailing semicolon) causes the preprocessor to replace subsequent instances of the 
identifier with the given string of tokens. A line of the form 


# define identifier( identifier, ... , identifier) token-string 


where there is no space between the first identifier and the ‘(’, is a macro definition with argu- 
ments. Subsequent instances of the first identifier followed by a ‘(’, a sequence of tokens dei- 
imited by commas, and a ‘)” are replaced by the token string in the definition. Each occurrence 
of an identifier mentioned in the formal parameter list of the definition is replaced by the 
corresponding token string from the call. The actual arguments in the call are token strings 
separated by commas; however commas in quoted strings or protected by parentheses do not 
separate arguments. The number of formal and actual parameters must be the same. Text 
inside a string or a character constant is not subject to replacement. 


In both forms the replacement string is rescanned for more defined identifiers. In both 
forms a long definition may be continued on another line by writing ‘\’ at the end of the line to 
be continued. 


This facility is most valuable for definition of ‘manifest constants’, as in 
# define TABSIZE 100 


int table{(TABSIZE]: 


A control line of the form 
# undef identifier 
causes the identifier’s preprocessor definition to be forgotten. 


12.2 File inclusion 
A compiler control line of the form 
# include "filename" 
causes the replacement of that line by the entire contents of the file flename. 


The named file is searched for first in the directory of the original source file, and then in 
a sequence of standard places. Alternatively, a control line of the form 


# include <filename> 
searches only the standard places, and not the directory of the source file. 
Includes may be nested. 
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12.3 Conditional compilation 
A compiler control line of the form 
# if constant-expression 


checks whether the constant expression (see $15) evaluates to non-zero. A control line of the 
form 


# ifdef identifier 


checks whether the identifier is currently defined in the preprocessor; that is, whether it has 
been the subject of a #define control line. A control line of the form 


# ifndef identifier 
checks whether the identifier is currently undefined in the preprocessor. 

All three forms are followed by an arbitrary number of lines, possibly containing a control 
line 

# else 
and then by a control line 

# endif 


If the checked condition is true then any lines between #else and #endif are ignored. If the 
checked condition is false then any lines between the test and an #else or, lacking an #else, 
the #endif, are ignored. 


These constructions may be nested. 


12.4 Line contro! 
For the benefit of other preprocessors which generate C programs, a line of the form 
# line constant identifier 
causes the compiler to believe, for purposes of error diagnostics, that the next line number is 


given by the constant and the current input file is named by the identifier. If the identifier is 
absent the remembered file name does not change. 


13. Implicit declarations 


It is not always necessary to specify both the storage class and the type of identifiers in a 
declaration. Sometimes the storage class is supplied by the context: in external definitions, and 
in declarations of formal parameters and structure members. In a declaration inside a function, 
if a storage class but no type is given, the identifier is assumed to be int; if a type but no 
storage class is indicated, the identifier is assumed to be auto. An exception to the latter rule is 
made for functions, since auto functions are meaningless (C being incapable of compiling code 
into the stack). If the type of an identifier is ‘function returning ...’, it is implicitly declared to 
be extern. 


In an expression, an identifier followed by ( and not currently deciared is contextually 
declared to be ‘function returning int’. 


14. Types revisited 
This section summarizes the operations which can be performed on objects of certain types. 


14.1 Structures and unions 


There are only two things that can be done with a structure or union: name one of its members 
(by means of the . operator); or take its address (by unary &). Other operations, such as 
assigning from or to it or passing it as a parameter, draw an error message. In the future, it is 
expected that these operations, but not necessarily others, will be allowed. 


§7.1 says that in a direct or indirect structure reference (with . or —>) the name on the 
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right must be a member of the structure named or pointed to by the expression on the left. To 
allow an escape from the typing rules, this restriction is not firmly enforced by the compiler. In 
fact, any Ivalue is allowed before ‘.’, and that lvalue is then assumed to have the form of the 
structure of which the name on the right is a member. Also, the expression before a ‘—>’ is 
required only to be a pointer or an integer. If a pointer, it is assumed to point to a structure of 
which the name on the right is a member. If an integer, it is taken to be the absolute address, 
in machine storage units, of the appropriate structure. 


Such constructions are non-portable. 


14.2 Functions 


There are only two things that can be done with a function: call it, or take its address. If the 
name of a function appears in an expression not in the function-name position of a call, a 
pointer to the function is generated. Thus, to pass one function to another, one might say 


int f(); 


g(f); 
Then the definition of g might read 


g (funcp) 
‘i (*funcp) (); 


a¢ 


(*funcp) (); 
} 


Notice that fwas declared explicitly in the calling routine since its first appearance was not fol- 
lowed by (. 


14.3 Arrays, pointers, and subscripting 


Every time an identifier of array type appears in an expression, it is converted into a pointer to 
the first member of the array. Because of this conversion, arrays are not Ivalues. By definition, 
the subscript operator {] is interpreted in such a way that ‘E1{E2]’ is identical to 
‘«((E1) + (E2))’. Because of the conversion rules which apply to +, if El is an array and E2 
an integer, then E1{E2] refers to the E2-th member of El. Therefore, despite its asymmetric 
appearance, subscripting is a commutative operation. 


A consistent rule is followed in the case of multi-dimensional arrays. If E is an n- 
dimensional array of rank ix/x --- xk, then E appearing in an expression is converted to a 
pointer to an (7~1)-dimensional array with rank jx --- xk If the * operator, either explicitly 
or implicitly as a result of subscripting, is applied to this pointer, the result is the pointed-to 
(n~1)-dimensional array, which itself is immediately converted into a pointer. 


For example, consider 
int x(3][5]; 


Here x is a 3X5 array of integers. When x appears in an expression, it is converted to a pointer 
to (the first of three) 5-membered arrays of integers. In the expression ‘x{i]’, which is 
equivalent to ‘*(x+i)’, x is first converted to a pointer as described; then / is converted to the 
type of x, which involves multiplying / by the length the object to which the pointer points, 
namely 5 integer objects. The results are added and indirection applied to yield an array (of 5 
integers) which in turn is converted to a pointer to the first of the integers. If there is another 
subscript the same argument applies again; this time the result is an integer. 


It follows from all this that arrays in C are stored row-wise (last subscript varies fastest) 
and that the first subscript in the declaration helps determine the amount of storage consumed 
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by an array but plays no other part in subscript calculations. 


15. Constant expressions | 

In several places C requires expressions which evaluate to a constant: after case, as array 

bounds, and in initializers. In the first two cases, the expression can involve only integer con- 

stants, character constants, and sizeof expressions, possibly connected by the binary operators 
+—-*/% &la << >> == I= < > <= > 

or by the unary operators 


or by the ternary operator 
9. 


Parentheses can be used for grouping, but not for function cails. 


A bit more latitude is permitted for initializers; besides constant expressions as discussed 
above, one can also apply the unary & operator to external or static objects, and to external or 
static arrays subscripted with a constant expression. The unary & can also be applied implicitly 
by appearance of unsubscripted arrays and functions. The basic rule is that initializers must 
evaluate either to a constant or to the address of a previously declared external or static object 
plus or minus a constant. 
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16. Grammar revisited. 


This section repeats the grammar of C in notation somewhat different than given before. 
The description below is adapted directly from a YACC grammar actually used by several com- 
pilers; thus it may (aside from possible editing errors) be regarded as authentic. The notation 
is pure YACC with the exception that the ‘!’ separating alternatives for a production is omitted, 
since alternatives are always on separate lines; the ‘;’ separating productions is omitted since a 
biank line is left between productions. 


The lines with “%term’ name the terminal symbols, which are either commented upon or 
should be self-evident. The lines with ‘left,’ ‘%right,’ and ‘“%binary’ indicate whether the 
listed terminals are left-associative, right-associative, or non-associative, and describe a pre- 
cedence structure. The precedence (binding strength) increases as one reads down the page. 
When the construction ‘%prec x appears the precedence of the rule is that of the terminal x; 
otherwise the precedence of the rule is that of its leftmost terminal. 


%term NAME 

%term STRING 

%term ICON 

%term FCON 

%term PLUS 

%term MINUS 

%term MUL 

%term AND 

%term QUEST 

%term COLON 

%term ANDAND 

%term OROR 

%term ASOP /* old-style = + etc. */ 
%term RELOP =/* <= >= < > «/ 
%term EQUOP /* == |= «/ 
%term DIVOP /*# 1% «/ 
%term OR [a | af 

%term EXOR [a a af 

%term SHIFTOP /* << >> #/ 
%term INCOP [ett —— 2/ 
%term UNOP {xl ~x/ 
%term STROP /*.—> */ 


%term TYPE /* int, char, long, float, double, unsigned, short */ 
%term CLASS /* extern, register, auto, static, typedef */ 

%term STRUCT /* struct or union */ 

%term RETURN ; 

%term GOTO 

%term IF 


%term ELSE 
%term SWITCH 
%term BREAK 
%term CONTINUE 
%term WHILE 
%term DO 

%term FOR 

%term DEFAULT 
%term CASE 
%term SIZEOF 
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%term LP /= («/ 
%term RP /+) «/ 
%term LC /s {s/ 
%term RC /»} «/ 
%term LB /«[*/ 
%term RB [+] «/ 
%term CM /« ,*/ 
%term SM Jo; s/ 


%term ASSIGN {os =m «/ 


%left CM 
%right ASOP ASSIGN 
%right QUEST COLON 


%ieft OROR 
%left ANDAND 
%ieft OROP 
%left AND 
%binary EQUOP 
%binary RELOP 


%left | SHIFTOP 

%left PLUS MINUS 

%ieft MUL DIVOP 

%right UNOP 

%right INCOP SIZEOF 

%left LB LP STROP 


program: ext_def_list 


ext_def_list: ext_def_list external_def 
/* empty */ 


external_def: optattrib SM 
optattrib init_dcl_list SM 
optattrib fdeciarator function_body 


function_body:  dcl_list compoundstmt 


dci_list: dcl_list declaration 
/* empty */ 
declaration: specifiers declarator_list SM 


specifiers SM 


optattrib: specifiers 


/* empty */ 
specifiers: CLASS type 

type CLASS 

CLASS 


type 


type: 


struct_dcl: 


type_dcl_list: 


type_deciaration: 


declarator_list: 


deciarator: 


nfdecilarator: 


fdeciarator: 


name_list: 


init_dei_list: 


init_declarator: 


init_list: 


initializer: 
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TYPE 
TYPE TYPE 
struct_dcl 


STRUCT NAME LC type_del_list RC 
STRUCT LC type_dei_list RC 
STRUCT NAME 


type_declaration 
type_decl_list type_declaration 


type declarator_list SM 
struct_dc! SM 
type SM 


declarator 
declarator_list CM declarator 


fdeciarator 

nfdeclarator 

nfdeclarator COLON con_e %prec CM 
COLON con_e %prec CM 


MUL nfdeclarator 
nfdeclarator LP RP 
nfdeciarator LB RB 
nfdeclarator LB con_e RB 
NAME 

LP nfdeclarator RP 


MUL fdeclarator 
fdeciarator LP RP 
fdeciarator LB RB 
fdeclarator LB con_e RB 
LP fdeciarator RP 
NAME LP name list RP 
NAME LP RP 


NAME 
name_list CM NAME 


init_deciarator %prec CM 
init_dcl_list CM init_declarator 


nfdeciarator 

nfdeclarator ASSIGN initializer 
nfdeciarator initializer 
fdeclarator 


initializer %prec CM 
init_list CM initializer 


e %prec CM 
LC init_list RC 


compoundstmt: 


stmt_list: 


statement: 
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LC init_list CM RC 
LC del_list stmt_list RC 


stmt_list statement 
/* empty */ 


e SM 
compoundstmt 
IF LP e RP statement 


- IF LP e RP statement ELSE statement 


WHILE LP e RP statement 

DO statement WHILE LP e RP SM 
FOR LP opt_e SM opt_e SM opt_e RP statement 
SWITCH LP e RP statement 
BREAK SM 

CONTINUE SM 

RETURN SM . 

RETURN e SM 

GOTO NAME SM 

SM 

label statement 


NAME COLON 
CASE con_e COLON 
DEFAULT COLON 


e %prec CM 


6 
/* empty «/ 


e %prec CM 
elist CM e 


eMULe 

eCMe 

e DIVOP e 

e PLUS e 

e MINUS e 

e SHIFTOP e 

e RELOP e 

e EQUOP e 

e ANDe 

e OROP e 

e ANDAND e 

e OROR e 

e MUL ASSIGN e 

e DIVOP ASSIGN e 
e PLUS ASSIGN e 
e MINUS ASSIGN e 
e SHIFTOP ASSIGN e 
e AND ASSIGN e 

e OROP ASSIGN e 
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e QUEST e COLON e 
e ASOP e 

e ASSIGN e 

term 


term: term INCOP 
MUL term 
AND term 
MINUS term 
UNOP term 
INCOP term 
SIZEOF term 
LP type_name RP term %prec STROP 
SIZEOF LP type_name RP %prec SIZEOF 
term LB e RB 
term LP RP 
term LP elist RP 
term STROP NAME 
NAME 
ICON 
FCON 
STRING 
LP e RP 


type_name: type abst_decl 


abst_decl: /* empty */ 
LP RP 
* LP abst_decl RP LP RP 
MUL abst_deci 
abst_decl LB RB 
abst_deci LB con_e RB 
LP abst_dec! RP 


Programming in C — A Tutorial 
Brian W. Kernighan 


Bell Laboratories, Murray Hill, N. J. 


1. introduction 

C is a computer language available on the GCOS and UNIX operating systems at Murray 
Hill and (in preliminary form) on OS/360 at Holmdel. C lets you write your programs clearly 
and simply — it has decent control flow facilities so your code can be read straight down the 
page, without labels or GOTO’s: it lets you write code that is compact without being too cryp- 
lic, if encourages modularity and good program organization; and it provides good daia- 
Structuring facilities. ; 

This memorandum is a tutorial to make learning C as painless as possible. The first part 
concentrates on the central features of C; the second part discusses those parts of the language 
which are useful (usually for getting more efficient and smailer code) but which are not neces- 
sary for the new user. This is nora reference manual. Details and special cases will be skipped 
rutnlessly, and no aitempt will be made to cover every language feature. The order of presen- 
tation is hopefully pedagogical instead of logical. Users who would like the full story should 
consult the C Reference Manual by D. M. Ritchie {1], which should be read for details anyway. 
Runtime support is described in (2] and [3]; you will have to read one of these to learn how to 
compile and run a C program. 

We will assume that you are familiar with the mysteries of creating files, text editing, and 
the lixe in the operating system you run on, and that you have programmed in some language 
before. 


2. A Simpie C Program 


main( ) { 
printt"heilo, world”); 


A C program consists of one or more /unctions, which are similar ta the functions and 
Subroutines of a Fortran program or the procedures of PL/I, and perhaps some external data 
definitions. main is such a function, and tn fact all C programs must have a main. Execution 
of the program begins at the first statement of main. main will usually invoke other functions 
to perform its job, some coming from the same program, and others from libraries. 

One method of communicating data between functions is by arguments. The parentheses 
following the function name surround the. argument list; here main is a function of no argu- 
ments, indicated by {). The {} enclose the statements of the function. Individual statements 
end with a semicolon but are otherwise free-format. 
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printf is a library function which will format and print output on the terminal (uniess 
some other destination ‘s specified). In this case it prints 


hello, worid 


A function is invoked by naming it, followed by a list of arguments in parentheses. There is 
no CALL statement as in Fortran or PL/I. 


3. A Working C Program: Variables; Types and Type Deciarations 
Here’s a bigger program that adds three integers and prints their sum. 


main( ) { 
int a, b, c, sum; 
az i; b= 2; c = 3; 
sum = a + Db + ¢; 
printf("sum is %d", sum); 


Arithmetic and the assignment statements are much the same as in Fortran (except for 
the semicolons) or Pit. The format of C programs is quite free. We can put several state- 
ments on a line if we want, or we can split a statement among several lines if it seems desir- 
able. The spiit may be between any of the operators or variables, but nor in the middle of a 
name or operator. As a matter of style, spaces, tabs, and newlines should be used freely to 
enhance readabiliiy. 


C has four fundamental /vypes of variables: 


int integer (PDP-11: 16 bits; H6070: 36 bits; IBM360: 32 bits) 
char one byte character (PDP-11, IBM360: 8 bits; H6070: 9 bits) 
float single-precision floating point 

double double-precision floating point 


There are aiso arrays and structures of these basic types, pointers to them and functions that re- 
turn them, all of which we will meet shortly. 

All variables in a C program must be declared, although this can sometimes be done im- 
plicitly by context. Declarations must precede executable statements. The declaration 


int a, 5, c, sum; 


deciares a, b, ¢, and sum to be integers. 


- Variable names have one to eight characters, chosen from A-Z, a-z, 0-9, and _, and start 
with a non-digit. Stylistically, it's much better to use only a single case and give functions and 
external variables names that are unique in the first six characters. (Function and external 
variable names are used by various assemblers, some of which are limited in the size and case 
of identifiers they can handle.) Furthermore, keywords and library functions may only be 
recognized in one case. 


4. Constants 


We have already seen decimal integer constants in the previous example — 1, 2, and 3. 
Since C is often used for system programming and bit-manipulation, octal numbers are an. im- 
portant part of the language. In C, any number that begins with 0 (zero!) is an octal integer 
(and hence can't have any 8's or 9s in it). Thus 6777 is an octal constant, with decimal value 
S11. 

A “character” is one byte (an inherently machine-dependent concept). Most often this 
IS expressed as a character constant, which is one character enclosed in single quotes. However, 
it may be any quantity that fits in a byte, as in flags below: 
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char quest, newline, flags: 
quest = ‘7’; 

newline = ‘\n’: 

flags = 077; 


The sequence ‘\n’ is C notation for “newline character", which, when printed, skips the 
terminal to the beginning of the next line. Notice that ‘\n’ represents only a single character. 
There are several other “escapes” like ‘\n’ for representing hard-to-get or invisible characters. 
such as ‘\t’ for tab, ‘\b’ for backspace, ‘\0’ for end of file, and ‘\\' for the backslash itself. 


float and double constants are discussed in section 26. 


5. Simple |/O — getchar, putchar, printf 


main( ) | 
char C; 
c = getchar| ); 
putcharic); 


getchar and putchar are the basic I/O library functions in C. getchar fetches one char- 
acter from the standard input (usually the terminal) each time it is called, and returns that 
character as the value of the function. When it reaches the end of whatever file it is reading, 
thereafter it returns the character represented by ‘\O0° (ascii NUL, which has value zero). We 
will see how to use this very shortly. 

putchar puts one character oul on (the standard output (usually the terminal) each time it 
is called. So the program above reads one character and writes it back out. By itself, this isn’t 
very interesting, bul observe that if we put a loop around this, and add a test for end of file, we 
have a complete program for copying one file to another. 

printf is a more complicated function for producing formatted output. We will talk about 
only the simplest use of it. Basicaily, printf uses its first argument as formatting information, 
and any successive arguments as variables to be output. Thus 


printf ("hello, world\n’); 
is the simplest use = the string “hello, world\n” is printed out. No formatting information, no 
variables, so the string is dumped out verbatim. The newline is necessary to put this out on a 
line by itself. (The construction 

“hello, worid\n’ 
is really an array of chars. More about this shortly.) 

More complicated. if sum is 6, 

printf (“sum is %d\n", sum); 
prints 

sum is 6 


Within the first argument of printf, the characters “%d" signify that the next argument in the 
argument list is to be printed as a base 10 number. 


Other useful formatting conimands are “%c" to print oul a single character, “%s”° to print 
Oul an entire string, and “%o™ to print a number as octal instead of decimal (no leading zero). 


For example, 


n= 511; 
printf (“What is the value of %d in octal?", n); 
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printf (" %st %d decimal is %o octai\n", "Right", n, n); 
prints 
What is the value of 511 in octal? Right! 511 decimai is 777 octal 


Notice that there is no newline at the end of the first output line. Successive cails to printf 
(and/or putchar, for that matter) simply put out characters. No newlines are printed unless 
you ask for them. Similarly, on input, characters are read one at a time as you ask for them. 
Each line is generally terminated by a newline (\n), but there is otherwise no concept of 
record. 


6. if; relational operators; compound statements 
The basic condiltional-testing statement in C is the If statement: 


c = getchar( ); 
I{ c == ‘7’ ) 
printf("why did you type a question mark?\n"); 
The simplest form of if is 


if (expression) statement 


The condition to be tested is any expression enclosed in parentheses. {t is followed by a 
statement. The expression is evaluated, and if its value is non-zero, the statement is executed. 
There’s an optional else clause, to be described soon. 

The character sequence ‘== is one of the relational operators in C; here is the complete 
set: 

==  equai to (EQ. to Fortraners) 
| = not equal to 

> greater than 

< less than 

>= greater than or equal to 
<= less than cr equal to 


The vaiue of “expression relation expression” is | if the relation is true, and 0 if false. 
Don't forget that the equality test is ‘==’: a single ‘=’ causes an assignment, not a test, and in- 
variably leads to disaster. 

Tests can be combined with the operators ‘&&' (AND), ‘Il’ (OR), and ‘1’ (NOT). For example, 
we can test whether a character is blank or tab or newline with 


iff cm =m’ [|] cm ae'\t' Il cm am'\n’ ) ... 


C guarantees that '&&' and ‘Il’ are evaluated left to right ~ we shail soon see cases where this 
matters. 
One of the nice things about C is that the statement part of an if can be made arbitrarily 


complicated by enclosing a set of statements in {}. As a simple example, suppose we want to 
ensure that @ is bigger than Db, as part of a sort routine. The interchange of a and 6 takes three 


statements in C, grouped together by {}: 


it (a < b) [ 
{ = @; 
a= Dp; 
b = ft; 
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As a general rule in C, anywhere you can use a simple statement, you can use any com- 
pound statement, which is just a number of simple or compound ones enclosed in {}. There is 
no semicolon after the } of a compound statement, but there is a semicolon after the last non- 
compound statement inside the {). 

The ability \o replace single statements by complex ones at will is one feature that makes 
C much more pleasant to use than Fortran. Logic (like the exchange in the previous example) 
which would require several GOTO’s and labels in Fortran can and should be done in C 
without any, using compound statements. 


7. While Statement; Assignment within an Expression; Null Statement 


The basic looping mechanism in C ts the while statement. Here’s a program that copies 
its input to its output a character at a time. Remember that ‘\0° marks the end of file. 


main( ) { 
char c; 
while( (c= getchar( )) != ‘\O’ ) 
putchar(c); = 


. 
The while statement is a loop, whose general form is 
while (expression) statement 


[is meaning is 


(a) evaluate the expression 
(b) if its value is true (i.e., not zero) ; 
do the statement, and go back to (a) 


Because the expression is tested before the statement is executed, the statement part can be 
executed zero times, which is often desirable. As in the if statement, the expression and the 
statement can both be arbitrarily complicated, although we haven't seen that yet. Our example 
gets the character, assigns it to c, and then tests if it’s a ‘\O". If it is not a ‘\0’, the statement 
part of the while is executed, printing the character. The while then repeats. When the input 
character is finally a ‘\0’, the while terminates, and so does main. 


Notice that we used an assignment statement 


c = getchar( ) 


within an expression. This is a handy notational shortcut which often produces clearer code. 
(In fact it is often the only way to write the code cleanly. As an exercise, re-write the file-copy 
without! using an assignment inside an expression.) [1 works because an assignment statement 
has a value, just as any Other. expression does. Its value is the value of the right hand side. 
This also implies that we can use multiple assignments like 


x=y =z = 0; 


Evaluation goes from right to left. 
By the way, the extra parentheses in the assignment statement within the conditional 
were really necessary: if we had said 
c¢ = getchar{ ) != ‘\0’ 
c would be set to 0 or | depending on whether the character fetched was an end of file or not. 


This is because in the absence of parentheses the assignment operator ‘=’ is evaluated after 
the relational operator ‘!='. When in doubt, or even if not, parenthesize. 
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Since putchar(c) returns ¢ as its function value, we could also copy the input to the out- 
put by nesting the calls to getchar and putchar. 
main( ) { 
while( putchar(getchar{ )) != ‘\o’ ); 
} 


What statement is being repeated? None, or technically, the nu// statement, because all the 
work is really done within the test part of the while. This version is slightly different from the 
previous one, because the final ‘\0° is copied to the output before we decide to stop. 


8. Arithmetic 


The arithmetic operators are the usual ‘+’, ‘—", ‘*’, and ‘/* (truncating integer division if 
the operands are both int), and the remainder or mod operator ‘%’: 


x = a%b; 
sets x to the remainder after a is divided by b (i.e, a mod 6). The results are machine depen- 
dent unless a and b are both positive. 


In arithmetic, char variables can usually be treated like int variables. Arithmetic on char- 

aciers is quite legal, and often makes sense: 
Cc = c + A’ wines ‘a’: 

converts a single lower case ascii character stored in ¢ to upper case, making use of the fact 
that corresponding ascu Ictters are a fixed distance apart. The rule governing this arithmetic is 
that all chars are converted to int before the arithmetic is done. Beware that conversion may 
involve sign-exiension — if the le’tmost bit of a character ts 1, the resulting integer might be 
negative. (This doesn't happen with genuine characters on any current machine.) 


So to convert a file into lower case: 


main( ) { 
char c; 
while( (c=getchar({ )) != ‘\O’ ) 
iff ‘A'<=c && c< ='2Z' ) 
putchar(c +'a’ ~—’A’); 
else 
putcharic): 


Characters have different sizes on different machines. Further, this code won'l work on an 
IBM machine, because the letters in the ebcdic alphabet are not contiguous. 


9. Else Clause; Conditional Expressions 
We just used an else after an if. The most general form of if is 
if (expression) statement1 else statement2 


the else part is optional, but often useful. The canonical example sets x to the minimum of a 
and Bb: 


if (a < b) 

x = Q: 
elise 

x = D; 


Observe that there's a semicolon after x=a. 
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C provides an alternate form of conditional which is often more concise. It is called the 
“conditional expression’’ because it is a conditional which actually has a value and can be used 
anywhere an expression can. The value of 


a<b? a: 0b: 
is a if ais less than 6; it is b otherwise. In general, the form 
expr! ? expr2 : expr3 
means “evaluate expr1. If it iS not zero, the value of the whole thing is expr2; otherwise the 
value is expr3." 
To set x to the minimum of a and b, then: 
x = (a<b ? a: 6): 
The parentheses aren’! necessary because ‘?:’ is evaluated before ‘=", but safety first. 
Going a step further, we could write the loop in the lower-case program as 


while( (c=getchar{ )) != ‘\O’ ) 
putchar( ('A’< =c && c< ='Z’) 2? c—'A'+'a’ : 0c); 


- Ifs and else's can be used to construct logic that branches one of several ways and then 
rejoins, a common programming structure, in this way: 


iff...) 
{...] 


aise if(...) 


else if...) 


(...] 
a) 


The conditions are tested in order, and exactly one block is executed — either the first one 
whose if is satisfied, or the one for the last alse. When this block is finished, the next state- 
ment executed is the one after the last else. If no action is to be taken for the “default” case, 
omit the last else. 


For example, to count letters, digits and others in a file, we could write 


else 


main( ) { 

int let, dig, other, c; 

let = dig = other = 0; 

while( (c=getchar( )) != ‘\O' ) 
if( (A'’< =e && c< ='Z') Il ('a’<=c && C<='Z)) ++!et: 
aise if ‘O'<=c && c< =’) ++dlig; 
eise ++other: 

printf("%d letters, %d digits, %d others\n", let, dig, other): 


The ‘++’ operator means “increment by |”: we will get to it in the next section. 


10. Increment and Decrement Operators 


In addition to the usual ‘—’, C also has two other interesting unary operators, ‘++’ (incre- 
ment) and ‘~—" (decrement). Suppose we want to count the lines in a file. 
main( ) { 
int ¢,N; 


n = 0; 
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while( (c=getchar( )) != ‘\O’ ) 
if( c m= ‘\n' ) 
+ +N; 
printf("%d lines\n", n): 


9 


++n is equivalent to n=n+ 1 bul clearer, particularly when n is a complicated expression. ‘++ 
and ‘-—’ can be applied only to int’s and char's (and pointers which we haven't got to yet). 

The unusual feature of ‘++° and ‘-—* is that they can be used either before or after a 
variable. The value of ++ is the value of k afrer it has been incremented. The value of k++ 
is k defore it is incremented. Suppose k is 5. Then 


X= + +k; 
increments k to 6 and then sets x to the resulting value, i.e. to 6. Bul 
X= K++; 
first sets x to to 5, and sien increments k to 6. The incrementing effect of ++k and k++ is the 


same, but their values are respectively 5 and 6. We shall soon see examples where both of 
these uses are important. 


11. Arrays 
In C, as in Fortran or PL/I, it ts possible to make arrays whose elements are. basic types. 
Thus we can make an array of 10 integers with the declaration 


int x{ 10}; 


The square brackets mean sudscripting: parentheses are used only for function references. Ar- 
ray indexes begin at zero, so the elements of x are 


xiO], xf1], xf2J, ..., x{9] 


If an array has n elements, the largest subscript is n—- 1. 
Multiple-dimension arrays are provided, though not much used above two dimensions. 
The declaration and use look like 


int name{ 10] [20]; 
n = name([i+j} [1] + namel[k] (21: 


Subscripts can be arbitrary integer expressions. Multi-dimension arrays are stored by row (op- 
posite to Fortran), so the rightmost subscript varies fastest; name has 10 rows and 20 columns. 


Here is a program which reads a line, stores it in a buffer, and prints its length (excluding 
the newline at the end). 


main( ) { 
int n, ¢; 
char line{ 100}: 
n= Q; 
while( (c=getchar( )) != ‘\n’ ) [ 
if{(n < 100) 
~ line{n] = c; 
act +; 


printf(length = %d\n’, n); 
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As a more complicated problem, suppose we want to print the count for each line in the 
input, still storing the first 100 characters of each line. Try it as an exercise before looking at 


the solution: 


main( ) { 
int n, c; char line{ 100]: 
n = Q; 
while{ (c=getchar( )) != ‘\O’ ) 
if{ c == ‘\n’ ) { 
printf("%d\n", n); 
n = 0; 
} 
eise { 
ifn < 100 ) line{n] = c:; 
n+ +: 


won 


12. Character Arrays; Strings ) _ 

Text is usually kept as an array of characters, as we did with line{ ] in the example above. 
By convention in C, the last character in a character array should be a ‘\0° because most pro- 
grams that manipulate character arrays expect it. For example, printf uses the ‘\0° to detect the 
end of a character array when printing it out with a ‘%s’. 


We can copy a character array $ into another t like this: 
i = 0; 
while( (tli]=s{il) != \0' ) . 
it +; nae oe, eae 


Most of the tume we have to put in our own ‘\O" at the end of a string; if we want to 
print the line with printf, it's necessary. This code prints the character count before the line: 


main( ) { 
int n; 
char line{ 100}; 
n = 0; ‘ 
while( (line(n+ +]=getchar( )) |= ‘\n’ ): 
line[n} = ‘\O’: 
printt("%d:\t%s", n, line): 
} 
Here we increment n in the subscript itself, but only after the previous value has been used. 
The character is read, placed in line[n}, and only then n is incremented. 
There is one place and one place only where C puts in the ‘\0’ at the end of a character 


array for you, and thal is in the construction 
"stuff between double quotes” 


The compiler puts a ‘\O° at the end automaticaily. Text enclosed in double.quotes is called a 
string; iS properties are precisely those of an (initialized) array of characters. 
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13. For Statement . 

The for statement is a somewhat generalized while that lets us pul the initialization and 
increment parts of a loop into a single statement along with the test. The general form of the 
for is 

for( initialization; expression: increment ) 
statement 
The meaning is exactly 
initialization; 
while( expression ) { 


statement 
increment: 


Thus, the following code does the same array copy as the example in the previous section: 

for( i=O; (tli]—sfi]) !== \O'; i++ ); 
This slightly more ornate example adds up the elements of an array: 

sum = 0; 

for{ i=0; i<n; i+ +) 

sum = sum + array(il; 
In the for statement, the initialization can be left out if you want, but the semicolon has 

to be there. The increment is also optional. [t is aot followed by a semicolon. The second 
clause, the test, works the same way as in the while: if the expression is true (not zero) do 


another loop, otherwise get on with the next statement. As with the while, the for loop may 
be done zero times. If the expression is left out, it is taken to be always true, so 


fort; ;).. 
and 
while({ 1)... 


are both infinite loops. 

You might ask why we use a for since it's so much like a while. (You might also ask 
why we use a while because...) The for is usually preferable because it keeps the code where 
it's used and sometimes eliminates the need for compound statements, as in this code that 
zeros a two-dimensional array: 

for{ i=O: i<n; i++ ) 
for{ j=0; j<m; j++ ) 
array{il{j] = 0; 


14. Functions; Comments 

Suppose we want, as part of a larger program, to count the occurrences of the ascii char- 
acters in some input text. Let us also map iHegal characters (those with vaiue>127 or <0) 
into one pile. Since this is presumably an isolated part of the program, good practice dictates 
making it a separate function. Here is one way: 
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main( ) { 
int hist{ 129]: /* 128 legal chars + 1 illegal group */ 
count(hist, 128); /* count the letters into hist °/ 
printf( ... ); /* comments look Iike this: use them °/ 


/* anywhere blanks, tabs or newlines could appear */ 


count(buf, size) 
int size, buff }; { 


int i, ¢; 
for{ i=0; i< =size; i++) 
buffi] = 0; /* set buf to zero */ 
while( (c=getchar{ )) != "\o’ ) { /* read til eot */ 
if(c > size lic <0) 
c = size: /* fix illegal input °/ 
buf{c]+ +: 
return: 


We have already seen many examples of calling a function, so let us concentrate on how to 
define one. Since count has two arguments, we need to declare them, as shown, giving their 
types, and in the case of buf, the fact that it is an array. The declarations of arguments go 
between the argument list and the opening ‘{". There is no need to specify the size of the array 
buf, for it is defined outside of count. 

The retum statement simply says to go back to the calling routine. In fact, we could have 
omitted it, Since a return is implied at the end of a function. 

What if we wanted count to return a value, say the number of characters read? The re- 
turn statement allows for this too: 


int i, c, nchar; 
nchar = 0; 


while( (c=getchar{ )) != ‘\o’ ) | 
if(c > size lle < 0) 
c = size; 
bufic]+ +; 
nchar + +; 


raturn(nchar): 


Any expression can appear within the parentheses. Here is a function to compute the 
minimum of two integers: 


min(a, b) 
int a, b; { 
retum( a <b ?7a:b): 


To copy a character array, we could write the function 
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strcopy(s1, $2) /* copies s1 to s2 */ 
char si{ ], s2f }; { 
int i; 
for(i = 0; (s2f{i] = si{i]) t= ‘\O'; i++ ); 


As is often the case, all the work is done by the assignment statement embedded in the test 
part of the for. Again, the declarations of the arguments $1 and $2 omit the sizes, because 
they don't matter to strcopy. (In the section on pointers, we will see a more efficient way to 
do a String copy.) 


There is a subtlety in function usage which can trap the unsuspecting Fortran program- 
mer. Simple variables (not arrays) are passed in C by “‘call by value”, which means that the 
called function is given a copy of its arguments, and doesn’t know their addresses. This makes 
it impossible to change the value of one of the actual input arguments. 

There are two ways out of this dilemma. One is to make speciai arrangements to pass to 
the function the address of a variable instead of its value. The other is to make the variable a 
global or external variable, which is known to each function by its name. We will discuss both 
possibilities in the next few sections. 


1§. Local and External Variables 
If we say 


{ ) { 


int x: 


gi) { 


int x; 


each x is /oca/ to its own routine — the x in f is unrelated to the x in g. (Local variables are 
also called ‘‘automatic’’.) Furthermore each local variable in a routine appears only when the 
function is called, and disappears when the function is exited. Local variables have no memory 
from one call to the next and must be explicitly initialized upon each entry. (There is a static 
storage class for making local variables with memory; we won't discuss it.) 


As opposed to local variables, externa/ variables are defined external to all functions, and 
are (potentially) available to all functions. External storage always remains in existence. To 
make variables external we have to define them external to all functions, and, wherever we 
want to use them, make a dec/aration. 


main( ) { 
exter int nchar, hist{ ]; 


count( ); 


C Tutorial 45. 


count( ) | 
- extern int nchar, hist{ }; 
int i, c; 
int hist({1 29]; /* space for histogram */ 
int nchar; /* character count °/ 


Roughly speaking, any function that wishes to access an external variable must contain an 
extern declaration for it.-.The declaration is the same as others, except for the added keyword 
extern, Furthermore, there must somewhere be a definiffon of the external variables external 
to all functions. 

External variables can be initialized; they are set to zero if not explicitly initialized. In its 
simplest form, initialization is done by putting the value (which must be a constant) after the 
definition: 


int nchar 0; 
char flag § ‘f; 
etc. 


- 


This is discussed further in a later section. 


This ends our discussion of what might be called the central core of C. You now have 
enough to write quite substantial C programs, and it would probably be a good idea if you 
paused long enough to do so. The rest of this tutorial will describe some more ornate construc- 
tions, useful but not essential. 


16. Pointers 

A pointer in C is the address of something. It is a rare case indeed when we care what the 
specific address itself is, but pointers are a quite common way to get at the contents of some- 
thing. The unary operator ‘&* is used to produce the address of an object, if it has one. Thus 


int a, b; 
b= &a:; 
puts the address of a into 6b. We can’t do much with it except print it or pass it to some other 


routine, because we haven't given b the right kind of declaration. But if we deciare that b is 
indeed a pointer to an integer, we're in good shape: 


int a, *D, c; 

b= &a:; 

C= "0. 
b contains the address of a and ‘c = #b’ means to use the value in 6 as an address, i.e., as a 
pointer. The effect is that we get back the contents of a, albeit rather indirectly. (It’s always 
the case that ‘#&x’ is the same as x if x has an address.) 


The most frequent use of pointers in C is for walking efficiently along arrays. In fact, in 
the implementation of an array, the array name represents the address of the zeroth element of 
the array, so you can’t use it on the left side of an expression. (You can’t change the address 
of something by assigning to it.) If we say 


char *y; 
char x{100];: 


y is of type pointer to character (although it doesn’t yet point anywhere). We can make y point 
to an element of x by either of ; 
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point to an element of x by either of 


y = &x{0); 
yx 


Since x is the address of x{O] this is legal and consistent. 
Now ‘ey’ gives x{0]. More importantly, 


“(y+ 1) gives x{1] ~ 
*“(y+i) gives x{l] 


and the sequence 


y = &xi0); 
y++; 
leaves y pointing at x{4]. 
Let's ":sé pointers in a function length that computes how long a character array is. 


Remember that by convention ail character arrays are terminated with a ‘\O’. (And if they 
aren't, this program will blow up inevitably.) The old way: 


length(s) 
char sl ]; { 
int n; 
for{ n=O; s[n] |= ‘\O’; ) 
n+ +; 
retum(n); 


Rewriting with pointers gives 


length(s) 
char *s: { 
int n; 
for( n=O; °s |= ‘\0’'; s++ ) 
A+ +; 
returnin); ‘ 


You can now see why we have to say what kind of thing 8 points to — if we're to increment it 
with $++ we have to increment it by the right amount. 

The pointer version is more efficient (this is almost always true) but even more compact 
is 

for( n=O; *s++ I= \O'; n++ ):; 

The ‘es’ returns a character; the ‘++’ increments the pointer so we'll get the next character 
next time around. As you can see, as we make things more efficient, we also make them less 
clear. But ‘¥s+-+’ is an idiom so common that you have to know it. 


Going a step further, here’s our function strcopy that copies a character array 8 to anoth- 
er t. 


sircopy(s,t) 
char *s, *t: { 
while(*t+ + = *3s+ +); 


We have omitted the test against ‘\0’, because ‘\0’ is identically zero: you will often see the 
code this way. (You must have a space after the ‘=’: see section 25.) 
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For arguments to a function, and there only, the declarations 


char sf }: 
char °s; 
are equivalent — a pointer to a type, or an array of unspecified size of that type, are the same 
thing. 
If this all seems mysterious, copy these forms until they become second nature. You 
don't often need anything more complicated. 


17. Function Arguments 


Look back at the function strcopy in the previous section. We passed it two string 
” Mames aS arguments, then proceeded to clobber both of them by incrementation. So how 
come we don't luse the original strings in the function that called strcopy? 


AS we said before, C is a “call by value” language: when you make a function call like 
f(x), the .arue of x is passed, not its address. So there's no way to‘a/ter x from inside f. If x is 
an array (char x{10]) this isn't a problem, because x ss an address anyway, and you're not trying 
to change it, just what tt addresses. This is why strcopy works as it does. And it's convenient 
nol to have to worry about making temporary copies of the inpul argumenis. 


But what if x 1s a scalar and you do want to change 1? In that case, you have to pass the 
address of x to f, and then use it as a pointer. Thus for example, to interchange lwo integers, 
we must write 


flip(x, y) 
int *x, *y: { 
int temp: 
temp = *x; 
ty — *y: 
*y = temp: 


and to call flip, we have to pass the addresses of the variables: 
flip (&a, &b): ; 


18. Multiple Levels of Pointers; Program Arguments 


When a C program is called, the arguments on the command line are made available to 
the main program as an argument count argc and an array of character strings argv containing 
the arguments. Manipulating these arguments is one of the most common uses of multiple 
levels of pointers (“pointer to pointer to ....°). By convention, arge is greater than zero; the 
first argument (in argv{O}) is the command name itself. 


Here is a program that simply echoes its arguments. 


main(argc, argv) 
int arge; 
char **argv; { 
int 1: 
for(i=1,i < argc; i++ ) 
printf("%s ", argv{i}); 
| putchar(’\n’): 


Step by step: main is called with two arguments, the argument count and the array of argu- 
ments. afgy is a pointer to an array, whose individual elements are pointers to arrays of char- 
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acters. The zeroth argument is the name of the command itse!f, so we start to print with the 
first argument, until we've printed them all. Each argv{i] is a character array, so we use a ‘%3’ 
in the printf. 
You will sometimes see the declaration of argv written as 
char *argv{ ]: 
which is equivalent. But we can’t use char argv{ j[ ], because both dimensions are variable and 
there would be no way to figure out how big the array is. 


Here’s a bigger example using argc and argv. A common convention in C programs is 
that if the first argument is °—', it indicates a flag of some sort. For example, suppose we want 
a program to be callable as 


prog —abc arj1 arg2 ... 
where the ‘—' argument is optional; if it is present, it may be followed by any combination of 
a, b, and c. 

main(arge, argv) 


int argc; 
char **argv: | 


aflag = bflag = cflag = 0; 
iff argc > 1 && argv{1}[0] == ‘—’ ) { 
for( i= 1; (c=argv(1]fi]) |= ‘\0’; i++ ) 


iff c= =’q’ ) 
aflag+ +; 
elise iff c=='b’ ) 
bflag+ +; 
else iff c= ='c’ ) 
cflag+ +; 
else 
print("%c?\n", c); 
—— argc; 
+ + argv; ; 


There are several things worth noticing about this code. First, there is a real need for the 
left-to-right evaluation that && provides; we don't want to look at argv{1] unless we know it’s 
there. Second, the statements 

—~ — argc: 

+ + argv; 
let us march along the argument list by one position, so we can skip over the flag argument as 
if it had never existed — the rest of the program is independent of whether or not there was a 
flag argument. This only works because argv is a pointer which can be incremented. 


19. The Switch Statement; Break; Continue 
The switch statement can be used to replace the multi-way test we used in the last exam- 
ple. When the tests are like this: 
if(c == ‘q’) ... 
else iff c == 'b’ )... 
else iff c == ’c’ )... 
else ... 
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testing a value against a series of consranis, the switch statement is often clearer and usually 
gives better code. Use it like this: 


switch( c ) { 


case ‘a’: 
aflag+ +; 
break; 
case ‘dD’: 
btlag+ +; 
dreak; 
case ‘Cc’: 
Cflag + +; 
break; 
defauit: 
printf("%c?\n", c): 
break: 


The case statements label! the various actions we want, default gets done if none of the other 
cases are satisfied. (A default is optional; if it isn’t there, and none of the cases match, you 
just fall out the bottom.) 


The break statement in this example is new. It is there because the cases are just labels, 
and after you do one of them, you /all through to the next unless you take some explicit action 
to escape. This is a mixed blessing. On the positive side, you can have multiple cases on a 
single statement; we might want to allow both upper and lower case letters in our flag field, so 
we could say 


case ‘a’: case ‘A’: 
case ‘dD’: case 'B’: 
etc. 
But what if we just want to get out after doing,case ‘a’ ? We could get out of a case of the 


Switch with a label and a goto, but this is really ugly. The break statement lets us exil 
without either goto or label. 


switch( c ) | 


case ‘a’: 
aflag+ +; 
break: 
case 'D’: 
dflag + +; 
break: 


" | 
/* the break statements get us here directly */ 

The break statement also works in for and while statemenis — il causes an immediate exit 

from the loop. 


The continue statement works on/v inside for's and while’s; 1 causes the next iteration of 
the loop to be started. This means it goes to the increment part of the for and the test part of 
the while. We could have used a continue in our example to get on with the next iteration of 
the for, but it seems clearer to use Break instead. 
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20. Structures 

The main use of structures is to lump together collections of disparate variable types, so 
they can conveniently be treated as a unit. For example, if we were writing a compiler or as- 
sembler, we might need for each identifier information like its name (a character array), its 
source line number (an integer), some type information (a character, perhaps), and probably a 
usage count (another integer). 


char _id{ 10]: 
int line; | 
char __ type: 
int usage; 


We can make a Structure out of this quite easily. We first tell C what the structure will 
look like, that is, wha! kinds of things it contains; after that we can actually reserve storage for 
it, esther in the same statement or separately. The simplest thing is to define it and allocate 
Storage all at once: 


struct { 
char _id{ 10]: 
int line: 
char _ type; 
int usage; 
} sym; 


This defines sym to be a structure with the specified shape; id, line, type and usage are 
members of the structure. The way we refer to any particular member of the structure is 


structure-name . member 
as in 
sym.type = 077; 
if sym.usage == 0) ... 


while( sym.id{j++]) ... 
etc. 


Although the names of structure members never stand alone, they sull have to be unique — 
there can't be another id or usage in some other structure. 

So far we haven't gained much. The advantages of structures start to come when we 
have arrays of structures, or when we want to pass complicated data layouts between functions. 
Suppose we wanted to make a symbol table for up to 100 identifiers. We could extend our 
definitions like 


char __id{ 100][ 10]; 
int line{ 100]; 
char _ type[ 100}: 
int usage{ 100]; 


bul a Structure lets us rearrange this spread-out information So all the data about a single iden- 
lifer is collected into one lump: 


struct | 
char __id{ 10]; 
int line; 
char _ type: 
int usage: 


} sym{ 100]: 
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This makes sym an array of structures; each array element has the specified shape. Now we 
can refer to members as 


sym{ij.usage+ +; /* increment usage of i-th identifier */ 
for( j=0; sym(i].idfj ++] != ‘\O'; ) ... 
etc. 


Thus to print a list of all identifiers that haven't been used, together with their line number, 


for( i=0; i<nsym; i++ ) 
if symf{i].usage == QO ) 
printf("%d\t%s\n", symfi].Jine, sym{i].id); 


Suppose we now want to write a function lookup(name) which will tell us if name already 
exisis in sym, by giving its index, or that 11 doesn’t, by returning a —1. We can’t pass a struc- 
ture to a function directly — we have to either define it externally, or pass a pointer to it. Let's 
try the first way first. 


int nsym 0; /* current length of symbol tabie °/ 
struct { 
char _ id{ 10}; 
int line; 
char = type: 
int usage; 
} sym{ 100}; /* symbol table */ 
main( ) { 


if (index = lookup(newname)) > = 0 ) 


sym(index].usage + +; /* already there ... */ 
eise 
install(newname, newline, newtype); 
lookup(s) 
char ‘s; | 
int i: 
exter struct { 
char _ icf 10}. 
int line; 
char _—itype: 
int usage; 
} symf{ ]; 


for{ i=0; i<nsym; i+ + ) 
if( compar(s, symli].id) > 0 ) 
returmn(i); 
return( — 1); 


comparis 1,82) /* retum 1 if si—=s2, 0 otherwise */ 
char *s1, °s2: { 
while( *s1++ == *s2) 
iff *sS2++ == \0 ) 
return( 1); 
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return(Q): 


The declaration of the structure in lookup isn’t needed if the external definition precedes its 
usc in the same source file, as we shall see in a moment. 


Now what if we want to use pointers” 
struct symtag | 


char id{ 10}: 
int line; | 
char __ type; 
int usage; 


| sym{100], *psym: 


psym = &sym{(O]: /* or psym = sym; */ 
This makes psym ua poimter to our kind of structure (the symbol table), then initializes it to 
pom! io the first element of sym. 


Notice that we added something alter the word struct: a “tag” called symtag. This puts 
a name on our structure definition so we can refer to it later withoul repeating the definition. 
It's not necessary bul useful. In fact we could have said 


struct symtag | 
... Structure definition 
}: 


which wouldn't have assigned uny storage at all, and then said 


struct symtag sym( 100]: 
Struct symtag *psym; 


which would define the array and the pointer. This could be condensed further, to 
struct symtag sym[{100], *psym: 


The way we actually refer to an member of a structure by a pointer is like thts: 
ptr —> structure-member 


The symbol '->° meuns we're pointing at a member of a structure; ‘~>' is only used in that 
context. ptr is a pointer to the (base of) a structure that contains the structure member. The 
expression ptr— >structure-member refers to the indicated member of the pointed-io struc- 
ture. Thus we have constructions like: 


psym—>type = 1: 
psym— >id{O] = ‘a’: 


and so on. 


For more coniplicated pointer expressions, it’s wise (o use parentheses to make it clear 
who goes with what. For example, 


struct | int x, *y; } *p: 

p->xt++ increments x 

++p-—->x so does this! 

(++p)—>x increments p before getting x 

*"p—- >y t+ uses y aS a pointer, then increments it 
*\p—>y)++ so does this 

*(p++)—>y uses y as a pointer, then increments p 


The way to remember these is that ~ >, . (dot), () and [] bind very tightly. An expression in- 
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volving one of these is treated as a unit. p— >x, ali], y.x and f(b) are names exactly as abc is. 


If p is a pointer to a structure, any arithmetic on 9 takes into account the acutal size of 
the structure. For instance. p++ increments p by the correct amount to get the nex! element 
of the array of structures. But don't assume that the size of a structure is the sum of the sizes 
of its members — because of alignments of different sized objects, there may be “holes” in a 
Structure. 


Enough theory. Here is the lookup example, this time with pointers. 


struct symtag { 


char’ _id{10]; 

int line: 

char __—ittype:; 

int usage: 
} sym{ 100): 


main( ) | 
struct symtag *lookup( ); 
struct symtag “*psym;: 


if (psym = lookup(newname)) ) /* non—zero pointer */ 
psym —> usage+ +; /* means aiready there °/ 
elise 
install(newname, newline, newtype): 


struct symtag *lookup(s) 
char °s: [ 
struct symtag °p; 
fort p=sym: p < &sym{nsyml; p++ ) 
if compar(s, p— >id) > 0) 
raturn(p): 
return); 


The function compar doesn’! change: ‘p— >id’ refers to a String. 

In main we test the pointer returned by lookup against zero, relying on the fact that a 
pointer is by definition never zero when it really points at something. The other pointer mani- 
pulations are trivial. 

The only complexity is the set of lines like 


struct symtag ‘lookup( ); 


This brings us to an area that we will treat only hurriedly — the question of function types. So 
far, all of our functions have returned integers (or characters, which are much the same). 
What do we do when the function returns something else, like a pointer to a structure? The 
rule is that any function that doesn't return an int has to say explicitly what it does return. 
The type information goes before the function name (which can make the name hard to see). 


Examples: 


char fla) 
int a; [ 
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int *g( ) ( ... } 


struct symtag ‘lookup(s) char ‘s; { ... } 


The function f returns a character, g returns a pointer to an integer, and lookup returns a 
pointer to a Structure that looks like symtag. And if we're going to use one of these functions, 
we have to make a declaration where we use it, as we did in main above. 

Nolice the parallelism between the deciarations 


struct symtag *lookup( ): 

struct symtag *psym; 
In effect, this says that lookup() and psym are both used the same way — as a pointer to a 
Sircture — even though one is a variable and the other is a function. 


21. Initialization of Variables 
An eiernal variable may be initialized at compile time by following its name with an ini- 

lializing value when it is defined. The initializing value has to be something whose vaiue is 
known at compile time, like a constant. 

int x O; /* "QO" could be any constant °/ 

int a ‘a; 

char flag 0177; 

int ‘p> = y(t}; /* p now points to y[1] */ 
An external array can be initialized by following its name with a list of initializations enclosed 
in braces: 


. int x{4] (0, 1,2,3}; /* makes x{i] = i */ 
int yf] (0, 1,2,3}; /* makes y big enough for 4 values */ 
char ‘msg “syntax error\n": /* braces unnecessary here °/ 
char *keyword{ ]{ 
"if", 
"alse", 
“for”, 
“while”, 
“preak” , 
“continue”, 
0 
}; 
This last one is very useful — i makes keyword an array of pointers to character strings, with 


a zero al the end so we can identify the last element easily. A simple lookup routine couid 
scan this until it erther finds a match or encounters a zero keyword pointer: 


lookup(str) /* search for str in keyword{ ] */ 
char ‘str: | 
int ijt; 
for( i==0; keyword{i] |= 0: i++) { 
for( j=0; (r=keyword{i]{j]) == str{j}] && cr != ‘\O'; j++ ); 
iff r == str{j] ) 
| return(i); 


return( — 1): 
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Sorry — neither local variables nor structures can be initialized. 


22. Scope Rules: Who Knows About What 

A complete C program need not be compiled ail at once: the source text of the program 
may be kept in several files, and previously compiled routines may be loaded from libraries. 
How do we arrange that data gets passed from one routine to another? We have already seen 
how to use function arguments and values, so let us talk about external data. Warning: the 
words deci/aration and definition are used precisely in this section; don't treat them as the same 
thing. , 
A major shortcut exists for making extern declarations. If the definition of a variable ap- 
pears before its use in some function, no extern declaration is needed within the function. 
Thus, if a file contains 


fC) (0 ] 


int foo; 
f2() { ... foo = 41:... | 
1.) ( ... if (foo)... | 


no declaration of foo is needed in either f2 or or (8, because the external definition of foo ap- 
pears before them. But if f1 wants to use foo, it has to contain the declaration 


f1() { 


extern int foo: 


This is true also of any function that exists on another file — if it wants foo it has to use 
an extern declaration for it. (If somewhere there is an exter declaration for something, there 
must also eventually be an external definition of it, or you'll get an “undefined symbol” mes- 
Sage.) 

There are some hidden pitfalls in external declarations and definitions if you use multiple 
source files. To avoid them, first, define and initialize each external variable only once in the 
entire set of files: 


int foo O; 


You can get away with multiple external definitions on UNIX, bul. not on GCOS. so don’t ask for 
trouble. Multiple initializations are illegal everywhere. Second, at the beginning of any file 
that contains functions needing a variable whose definition is in some other file, put in an e@x- 
tern declaration, outside of any function: 


extem int foo; 


fM() {0 ) 
etc. 


The #include compiler control line, to be discussed shortly, lets you make a single copy 
of the external declarations for a program and then stick them into each of the source files 
making up the program. 


23. #define, #inciude 
C provides a very limited macro facility. You can say 
#define name something 
and thereafter anywhere ‘“‘name™ appears as a token, “something” will be substituted. This is 
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particularly useful in parametering the sizes of arrays: 
#define ARRAYSIZE 100 
int arr[ARRAY SIZE}: 
while( i++ < ARRAYSIZE )... 


{now we can alter the entire program by changing only the define) or in setting up mysterious 
constants: 


# define SET 01 
#define INTERRUPT 02 /* interrupt bit °/ 


#define ENABLED 04 


if( x & (SET | INTERRUPT | ENABLED) ) ... 
Now we have meaningful words instead of mysterious constants. (The mysterious operators 
‘& (AND) and ‘!’ (OR) will be covered in the next section.) It’s an excellent practice to write 
programs without any literal constants except in #define statements. 

There are several warnings about #define. First, there’s no semicolon at the end of a 
#define; all the text from the name to the end of the line (except for comments) is taken to 
be the ‘something’. When it’s put into the text, blanks are placed around it. Good style typi- 
cally makes the name in the #define upper case — this makes parameters more visible. 
Definitions affect things only after they occur, and only within the file in which they occur. 
Defines can’t be nested. Last. if there is a #define in a file, then the first character of the file 
must be a ‘#', to signal the preprocessor that definitions exist. 


The other control word known to C is #include. To include one file in your source at 
compilation time, say 
#include “filename” 


This is useful for putting a lot of heavily used data definitions and #define statements at the 
beginning of a file to be compiled. As with #define, the first line of a file containing a #in- 
clude has to begin with a‘#’. And #include can’t be nested — an included file can’t contain 
another #inciude. 


24. Bit Operators 
C has several operators for logical bit-operations. For example, 


x = x & 0177; 
forms the bil-wise AND of x and 0177, effectively retaining only the last seven bits of x. Other 
operators are ‘ 

| inclusive OR 


(circumflex) exclusive OR 

(tilde) 1’s complement 

! logical NOT 

<< left shift (as in x< <2) 

>> ~~ right shift (arithmetic on PDP-11: logical on H6070, 1BM360) 
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25. Assignment Operators 


An unusual feature of C is that the normal binary ope.ators like ‘+’, ‘-’, etc. can be 
combined with the assignment operator ‘=’ to form new assignment operators. For example, 


3 


x =— 10; 
uses the assignment operator ‘=—’ to decrement x by 10, and 

x =& 0177 
forms the AND of x and 0177. This convention is a useful notational shortcut, particularty if x 
iS acomplicated expression. The classic example is summing an array: 

for( sum=i=0; i<n; i++ ) 

sum = + arrayfil: 

But the spaces around the operator are critical! For instance, 

x = — 10; 
sets x to -10, while 

x =— 10: 
subtracts 10 from x. When no space is present, 

x== — 10: 
also decreases x by 10. THis is quite contrary to the experience of most programmers. In par- 
ticular, watch out for things like 

c=°st++: 

y = &x(0); 
both of which are almosi certainly not what you wanted. Newer versions of various compilers 
are courteous enough to warn you about the ambiguity. 

Because all other operators in an expression are evaluated before the assignment operator, 

the order of evaluation should be watched carefully: 


x= x<<ylz: 

means “shift x left y places, then oR with z, and store in x.” But 
xamcc yl] Zz 

means “shift x left by yiz places’, which is rather different. 


26. Floating Point 

We've skipped over floating point so far, and the treatment here will be hasty. C has sin- 
gie and double precision numbers (where the precision depends on the machine at hand). For 
example, 


double sum: 

float avg, y{10): 

sum = 0.0: 

for{ im=O; i<n; i++ ) 
sum = + yf{i]; 

avg = sum/n; 


forms the sum and average of the array y. 


All floating arithmetic is done tn double precision. Mixed mode arithmetic is legal: if an 
arithmetic operator in an expression has both operands int or char, the arithmetic done is in- 
teger, bul if one operand is int or Char and the other is float or doubie, both operands are con- 
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veried to double. Thus if i and j are int and x is float, 


(x+i)/j converts i and j to float 
x + i/j does i/j integer, then converts 


Type conversion may be made by assignment; for instance, 


int m,n; 
float x, y; 
m= x: 
yun 
converts x to integer (truncating toward zero), and n to floating point. 
Floating constants are just like those in Fortran or PL/I, except that the exponent letter is 
‘e instead of ‘E*. Thus 


pi = 3.14159; 
large = 1.23456789e 10; 


printf will format floating point numbers: “%w.df" in the format string will print the 
corresponding variable in a field w digits wide, with dG decimal places. An @ instead of an f will 
produce exponential notation. 


27. Horrors! goto's and labeis 
C has a goto statement and labels. so you can branch about the way you used to. But 
most of the time goto’s aren't needed. (How many have we used up to this point?) The code 
can almost always be more clearly expressed by for/while, if/else, and compound statements. 
One use of goto’s with some legitimacy is in a program which contains a long loop, 
where a while(1) would be too extended. Then you might write 


mainioop: 


goto mainioop; 


Another use is !o implement a break out of more than one level of for or while. goto’s can 
only branch to labels within the same function. 
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A New Input-Output Package 


D. M. Ritchie 


Bell Laboratories 
Murray Hill, New Jersey 07974 


A new package of IO routines is available. It was designed with the following goals in 
mind. . 
1. It should be similar in spirit to the earlier Portable Library, and, to the extent possible, be 
compatible with it. At the same time a few dubious design choices in the Portable Library 
will be corrected. 


2. It must be as efficient as possible, both in time and in space, so that there will be no hesi- 
tation in using it no matter how critical the application. 


3. It must be simple to use, and also free of the magic numbers and mysterious calls the use 
of which mars the understandability and portability of many programs using older pack- 
ages. 


4. The interface provided should be applicable on all machines, whether or not the programs 
which implement it are directly portable to other systems, or to machines other than the 
PDP11 running a version of Unix. 


It is intended that this package replace the Portable Library. Although it is not directly 
compatible, as discussed below, it is sufficiently similar that modifying programs to use it 
should be a simple exercise. 


The most crucial difference between this package and the Portable Library is that the 
current offering names streams in terms of pointers rather than by the integers known as ‘file 
descriptors.’ Thus, for example, the routine which opens a named file returns a pointer to a cer- 
tain structure rather than a number; the routine which reads an open file takes as an argument 
the pointer returned from the open call. 


General Usage 
Each program using the library must have the line 


#include <stdio.h> 


which defines certain macros and variables. The library containing the routines is 
‘/usr/lib/libS.a,’ so the command to compile is 


Ss 


cc... IS 


All names in the include-file intended only for internal use begin with an underscore ‘_’ to 
reduce the possibility of collision with a user name. The names intended to be visible outside 
the package are 


stdin The name of the standard input file 
stdout The name of the standard output file 
stderr The name of the standard error file 


EOF is actually —1, and is the value returned by the read routines on end-of-file or error. 


a 


NULL is a notation for the null pointer, returned by pointer-valued functions to indicate an 
error 

FILE expands to ‘struct _iob’ and is a useful shorthand when declaring pointers to 
streams. 


BUFSIZ is a number (viz. 512) of the size suitable for an IO buffer supplied by the user. See 
setbuf, below. 

getc, getchar, putc, putchar, feof, ferror, fileno 
are defined as macros. Their actions are described below; they are mentioned here 
to point out that it is not possible to redeclare them and that they are not actually 
functions; thus, for exampie, they may not have breakpoints set on them. 

The routines in this package, like the Portable Library, offer the convenience of automatic 
buffer allocation and output flushing where appropriate. Absent, however, is the facility of 
changing the default input and output streams by assigning to ‘cin’ and ‘cout.’ The names 
‘stdin,’ stdout,’ and ‘stderr’ are in effect constants and may not be assigned to. 


Calls 

The routines in the library are in nearly one-to-one correspondence with those in the 
Portable Library. In several cases the name has been changed. This is an attempt to reduce 
confusion. 
FILE *fopen(filename, type) char *filename, *type 
Fopen opens the file and, if needed, allocates a buffer for it. Filename is a character string speci- 
fying the name. Type is a character string (not a single character). It may be ‘"r",’ ‘“w",’ or 
‘"a"’ to indicate intent to read, write, or append. The value returned is a file pointer. If it is 
NULL the attempt to open failed. 
FILE *freopen(filename, type, ioptr) char “filename, “type; FILE *ioptr 
The stream named by ioptr is closed, if necessary, and then reopened as if by fopen. If the 
attempt to open fails, NULL is returned, otherwise joptr, which will now refer to the new file. 
Often the reopened stream is stdin or stdout. 
int getc(ioptr) FILE *ioptr 
returns the next character from the stream named by /optr, which is a pointer to a file such as 
returned by fopen, or the name stdin. The integer EOF is returned on end-of-file or when an 
error occurs. The null character “\0’ is a legal character. 
int feetc(ioptr) FILE *ioptr 
acts like getc but is a genuine function, not a macro. 


putc(c, ioptr) FILE “ioptr 

Putc writes the character c on the output stream named by /optr, which is a value returned from 
fopen or perhaps stdout or stderr. The character is returned as value, but EOF is returned on 
error. 

fputc(c, ioptr) FILE *ioptr 

Fputc acts like pute but is a genuine function, not a macro. 

felose(ioptr) FILE *ioptr 

The file corresponding to /optr is closed after any buffers are emptied. A buffer allocated by the 
IO system is freed. Fclose is automatic on norma! termination of the program. 

Sfush(ioptr) FILE *ioptr 


Any buffered information on the (output) stream named by /optr is written out. Output files 
are normally buffered if and only if they are not directed to the terminal, but stderr is 
unbuffered unless serbufis used. 


exit(errcode) 

Exit terminates the process and returns its argument as status to the parent. This is a special 
version of the routine which calls ffush for each output file. To terminate without flushing, use 
_exit. 

Jeoflioptr) FILE *ioptr 

returns non-zero when end-of-file has occurred on the specified input stream. 


Jerrortioptr) FILE *ioptr 


returns non-zero when an error has occurred while reading or writing the named stream. The 
error indication lasts until the file has been closed. 


getchar( ) 
is identical to getc(stdin). 


putchar(c) 
is identical to purtc(c, stdout). 


char “gets(s) char *s 


reads characters up to a new-line from the standard input. The new-line character is replaced 
by a null character. It is the user’s responsibility to make sure that the character array s is large 
enough. Gets returns its argument, or NULL if end-of-file or error occurred. Note that this 
routine is not compatible with /gers; it is included for downward compatibility. 


char *fgets(s, n, ioptr) char *s; FILE *ioptr 


reads up to 1 characters from the stream /optr into the character pointer s. The read terminates 
with a new-line character. The new-line character is placed in the buffer followed by a null 
character. The first argument, or NULL if error or end-of-file occurred, is returned. 


puts(s) char *s 


writes the null-terminated string (character array) s on the standard output. A new-line is 
appended. No value is returned. Note that this routine is not compatible with /puts; it is 
included for downward compatibility. 


*fputs(s, ioptr) char *s; FILE *ioptr 
writes the null-terminated string (character array) s on the stream joprr. No new-line is 
appended. No value is returned. 


ungetc(c, ioptr) FILE *ioptr 
The argument character c is pushed back on the input stream named by /optr. Only one charac- 
ter may be pushed back. 


print{(Gformat, al, ...) char *format 

Jprintf{(ioptr, format, al, .. .) FILE *ioptr; char *format 

sprintf(s, format, al, .. char *s, *format 

Printf writes on the standard output. Fprintf writes on the named output stream. Sprint puts 
characters in the character array (string) named by s. The specifications are as described in sec- 


tion printf (III) of the Unix Programmer’s Manual. There is a new conversion: %m.ng converts 
a double argument in the style of e or fas most appropriate. 


scanf(format, al, ...) char *format 
fscanflioptr, format, al, .. .) FILE “ioptr; char *format 
sscanf(s, format, al, ...) char *s, *format 


Scanf reads from the standard input. Fscanfreads from the named input stream. Sscanf reads 
from the character string supplied as s. The specifications are identical to those of the Portable 
Library. Scanf reads characters, interprets them according to a format, and stores the results in 
its arguments. It expects as arguments a control string format, described below, and a set of 


a a 


arguments, each of which must be a pointer, indicating where the converted input should be 
stored. 


The control string usually contains conversion specifications, which are used to direct 
interpretation of input sequences. The control string may contain: 


1. Blanks, tabs or newlines, which are ignored. 


2. Ordinary characters (not %) which are expected to match the next non-space character of 
the input stream (where space characters are defined as blank, tab or newline). 


3. Conversion specifications, consisting of the character %, an optional assignment suppress- 
ing character , an optional numerical maximum field width, and a conversion character. 


A conversion specification is used to direct the conversion of the next input field; the 
result is placed in the variable pointed to by the corresponding argument, unless assignment 
suppression was indicated by the character. An input field is defined as a string of non-space 
characters; it extends either to the next space character or until the field width, if specified, is 
exhausted. 


The conversion character indicates the interpretation of the input field; the corresponding 
pointer argument must usually be of a restricted type. The foilowing conversion characters are 
legal: 

% indicates that a single % character is expected in the input stream at this point; no assign- 
ment is done. 


d indicates that a decimal integer is expected in the input stream; the corresponding argu- 
ment should be an integer pointer. 


fe) indicates that an octal integer is expected in the input stream; the corresponding argument 
should be a integer pointer. 


x indicates that a hexadecimal integer is expected in the input stream; the corresponding 
argument should be an integer pointer. 


S ._ indicates that a character string is expected; the corresponding argument should be a char- 
acter pointer pointing to an array of characters large enough to accept the string and a ter- 
minating ‘\0’, which will be added. The input field is.terminated by a space character or a 
newline. 


c indicates that a character is expected; the corresponding argument should be a character 
pointer; the next input character is placed at the indicated spot. The normal skip over 
space characters is suppressed in this case; to read the next non-space character, try %/s. 
If a field width is given, the corresponding argument should refer to a character array, and 
the indicated number of characters is read. 


e (or /) indicates that a floating point number is expected in the input stream; the next field 
is converted accordingly and stored through the corresponding argument, which should be 
a pointer to a float. The input format for floating point numbers is an optionally signed 
string of digits possibly containing a decimal point, followed by an optional exponent field 
beginning with an E or e followed by an optionally signed integer. 


[ indicates a string not to be delimited by space characters. The left bracket is followed by a 
set of characters and a right bracket; the characters between the brackets define a set of 
characters making up the string. If the first character is not circumflex (~), the input field 
is all characters until the first character not in the set between the brackets; if the first 
character after the left bracket is “, the input field is all characters until the first character 
which is in the remaining set of characters between the brackets. The corresponding argu- 
ment must point to a character array. 


The conversion characters d, o and x may be capitalized or preceded by / to indicate that a 
pointer to /ong rather than int is expected. Similarly, the conversion characters e or f/ may be 
capitalized or preceded by / to indicate that a pointer to double rather than ffoat is in the argu- 
ment list. The character A will function similariy in the future to indicate short data items. 


For example, the call 


int i; float x; char name[S0]- 
scanf( "%d%f%s", &i, &x, name): 


with the input line 
25 $4.32E-1 thompson 
will assign to / the value 25, x the value 5.432, and name will contain ‘“‘thompson\0’’. Or, 


int i; float x; char name([50]. 
scanf("%2d%f%d% [1234567890]", &i, &x, name); 


with input 

56789 0123 $6a72 
will assign 56 to i, 789.0 to x, skip ‘°0123’’, and place the string ‘‘56\0”’ in name. The next call 
to getchar will return ‘a’. 


Scanf returns as its value the number of successfully matched and assigned input items. 
This can be used to decide how many input items were found. On end of file, EOF is returned; 
note that this is different from 0, which means that the next input character does not match 
what was cailed for in the control string. 
fread(ptr, sizeof(*ptr), nitems, ioptr) FILE *ioptr 
reads nitems of data beginning at ptr from file joptr. It behaves identically to the Portable 
Library’s cread. No advance notification that binary IO is being done is required; when, for 
portability reasons, it becomes required, it will be done by adding an additional character to the 
mode-string on the fopen call. 
fwrite(ptr, sizeof(*ptr), nitems, ioptr) FILE *ioptr 
Like fread, but in the other direction. 


rewind(ioptr) FILE *ioptr 

rewinds the stream named by /optr. It is not very useful except on input, since a rewound out- 
put file is still open only for output. 

system (string) char *string 

The string is executed by the shell as if typed at the terminal. 

getwlioptr) FILE *ioptr 

returns the next word from the input stream named by ioptr. EOF is returned on end-of-file or 
error, but since this a perfectly good integer feofand /error should be used. 

putw(w, ioptr) FILE *ioptr ) 

writes the integer w on the named output stream. 

setbufCioptr, buf) FILE *ioptr; char *buf 

Setbuf may be used after a stream has been opened but before IO has started. [f bufis NULL, 


the stream will be unbuffered. Otherwise the buffer supplied will be used. It is a character 
array of sufficient size: 


char buf{BUFSIZ). 


fileno(ioptr) FILE *ioptr 
returns the integer file descriptor associated with the file. 
fseek(ioptr, offset, ptrname) FILE *ioptr; long offset 


The location of the next byte in the stream named by /optr is adjusted. Offset is a long integer. 
If ptrname is 0, the offset is measured from the beginning of the file; if ptrname is 1, the offset 
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is measured from the current read or write pointer; if ptrname is 2, the offset is measured from 
the end of the file. The routine accounts properly for any buffering. (When this routine is 
used on non-Unix systems, the offset must be a value returned from fell and the ptrname must 


be 0). 


long ftell(ioptr) FILE *ioptr 

The byte offset, measured from the beginning of the file, associated with the named stream is 
returned. Any buffering is properly accounted for. (On non-Unix systems the value of this call 
is useful only for handing to /seek, so as to position the file to the same place it was when /ell 
was called.) 


getpw(uid, buf) char *buf 

The password file is searched for the given integer user ID. If an appropriate line is found, it is 
copied into the character array buf, and Q is returned. If no line is found corresponding to the 
user ID then | is returned. 


char *calloc(num, size) 


allocates space for num items each of size size. The space is guaranteed to be set to 0 and the 
pointer is sufficiently well aligned to be usable for any purpose. NULL is returned if no space 
is available. 


cfree(ptr) char “ptr 

Space is returned to the pool used by calloc. Disorder can be expected if the pointer was not 
obtained from cailoc. 

The following are macros defined by stdio.h. 


isalpha(c) 
returns non-zero if the argument is alphabetic. 


isupper(c) 
returns non-zero if the argument is upper-case alphabetic. 


islower(c) 

returns non-zero if the argument is lower-case alphabetic. 
isdigit(c) 

returns non-zero if the argument is a digit. 

isspace(c) 


returns non-zero if the argument is a spacing character: tab, new-line, carriage return, vertical 
tab, form feed, space. 


toupper(c) 
returns the upper-case character corresponding to the lower-case letter c. 


tolower(c) 
returns the lower-case character corresponding to the upper-case letter c. 
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ABSTRACT 


A new general-purpose subroutine library has been written for PWB/UNIX. It complements 
the functions provided by D. M. Ritchie’s 4 New Input-Output Package. This library has 
been used in the implementation of Release 4 of the Programmer’s Workbench Source Code 
Control System (sccs/PwWB), and many small UNIX C programs. It is efficient in time and 
space, as well as being easy to use. This document is a user’s guide to this library. 


1. ‘Include’? FILES 


The directory ‘‘/usr/include’’ contains public inciude files. Users of the subroutine library should be 
familiar with the contents of these files. The following are available: 


archive.h Defines the ‘“‘magic’’ number of an archive file and declares a structure for the header 
of an archive file. 

ctype.h Defines macros for testing whether a character is alphabetic, upper- , lower-case, a 
digit, a “‘space,’’ and for converting upper-case characters to lower-case and vice- 
versa. 

dir.h Declares a structure for a directory entry. 

errnos.h Defines system call error numbers (see JNTRO(II)). These were copied from the 

. UNIX system source. 

fatal.h Defines certain macros, constants, and variables used by the general-purpose error 
and signal handling subroutines (see below). 

macros.h Defines some general-purpose macros. 

misc.h Declares some unnamed structures for accessing pieces of a variable (e.g., the low 
byte of an integer). These were copied from the UNIX system source. 

stat.h Declares an inode structure to be used with szar({II) and /srar(II), and defines con- 
stants for the various mode bits of an inode. 

stdio.h Used by A New Input-Output Package. 

time.h Declares a structure to be used with Jocaltime(II]). 

system.h Defines certain system constants; in particular, signal numbers. These were copied 


from the UNIX system source. 


2. SUBROUTINES 


There are four sets of subroutines available. Three of these sets are kept in one library 
(‘‘/usr/lib/libpw.a’’, accessible as —/pw), and the fourth set is kept in a second library 
(‘‘/usr/lib/libwrt.a’’, accessible as —/wrt) for reasons that will be explained below, —/wrt should nor- 
mally be last on the cc argument list. 


The first library (—/pw) contains the siring, error, and sys sets. The second library (—/wrt) contains 
only the write set (see below). 


The string set can be used in conjunction with any other subroutines. The s¢ring set is independent of 
the other sets and of other subroutine libraries; no subroutine in the séring set calls any subroutine not 
in the string set. 
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The error set provides general-purpose error and signal handling routines. 


The sys set provides interfaces to most of the commonly used system calls. These interfaces will call 
fatal) (see the error set) if an error condition is detected in the interface. 


The write set contains an interface to the write(II) system call; this interface handles errors and calls 
fatal(), if necessary. If the write set is used, then ail calls to the subroutine write() will be directed to 
this write routine (remember, it will call fazal() if an error is detected). It is for this reason that the 
write set is in a separate library; if all routines were in the same library, then users would be unwittingly 
using the write(II) routine of the write set. 


The subroutines presented here do not call nargs(III), as the nargs(III) subroutine does not work if a 
program is loaded with separate data and instruction spaces. 


2.1 String Set 


char *alloca(nbytes) 
Allocates néytes bytes of automatic memory. Automatic memory is freed upon return from the calling 
function (space is allocated from the stack). There is no way to explicitly free a piece of memory gotten 
from allocaQ). Alloca() returns the address of the allocated area; a memory fault is generated when 
there isn’t enough memory. 


N. B.: Use of alloca() as an argument to another subroutine will not work correctly because argu- 
ments are pushed onto the stack and ailoca() takes memory from the stack. It is necessary to first 
set a temporary variable equal to the value returned by a/loca(), and then pass that variable to the 
other subroutine. 


any(c, str) 
If character c is equal to any character in the string str, returns 1; else returns 0. 


anvstr(strl, str2) 
If any character of string sir/ is equal to any character in string sér2, returns the offset (in sir/) of the 
first such match; else returns —1. 


balbrk(str, open, clos, end) 
Finds the offset, in string str, of the first of the characters in the string end occurring outside of a bal- 
anced string. A balanced string contains matched occurrences of any character in the string open and the 
corresponding character in the string c/os. Balanced strings may be nested. In addition to the characters 
in end, the null character is implicitly an end character. Unmatched members of open or close result in 
an error return (a value of —1 is returned). 


Example 1: 


balbrk(s, 0, c, e) returns 7. 
Example 2: 
s = "albe=2=3"; 
with o, c, and e as in Example 1, baibrk(s, 0, c, e) returns —1. 


char *cat(dest, source,, source), source;, ..., source,, 0) 
Concatenates strings. First, string source, is copied to string dest. Then subsequent source, strings are 
concatenated (by copying) onto the end of dest. The space for dest must be allocated by the caller (i.e., 
dest is taken to be the address of an area of memory large enough to hold the result). The address of 
the result (i.e., des?) is returned. 
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dname (pathname) 
Returns a pointer to the name of the directory that contains the file pointed to by pathname. Dname() 
is the complement of sname() (see below). If pathname is a simple name (e.g., “‘file’’), a pointer to 
**.”’ is returned. If pathname is ‘‘/unix’’, a pointer to ‘‘/”’ is returned. If pathname is ‘‘/bin/who’’, a 
pointer to ‘‘/bin’’ is returned, etc. The string pointed to by pathname is modified by dname(); pathname 
is returned. 


equal(strl, str2) 
If string str/ is equal to string str2, returns 1; else returns 0. 


imatch(prefix, str) 
Initial match. If string prefix is a prefix of string sir, returns 1; else returns 0. 


index(strl, str2) ; | 
If string str2 is a substring of string sir/, returns the offset of the first occurrence of s¢r2 in szrl, else 
returns —1. 


moveta, b, n) 
Copies the first 1 characters from string a to string 0. 


patoi(str) 
Converts an ASCII string to a integer. The string sir is taken to be a string of decimal digits; the 
numeric value represented by sir is returned. Converts positive numbers only. Returns —1 if a non- 
numeric character is encountered. 


long patol(str) 
Converts an ASCII string to a long integer. The string str is taken to be a string of decimal digits; the 
numeric value represented by str is returned. Converts positive numbers only. Returns —1 if a non- 
numeric character is encountered. 


char *repeat(result, str, repfac) 
The string szr is first copied to the string result. Then sir is copied repfac— 1 times onto the end of 
resuit. As with cat() (see above), allocation of space for result is the caller’s responsibility. Result is 
returned. 


char *satoi(str, ip) 
Satoi() is similar to patoi (see above), except that the integer value is stored through the integer pointer 
ip, and a pointer to the first non-numeric character encounterad is returned. 


size(str) 
Returns the number of bytes of memory used by string sz, including the null byte, so that 
size(str) is equal to length(str) + 1. 


char *sname(str) 
Sname() returns a pointer to the ‘‘simple’’ name of path name sz; i.e., a pointer to the first character 
after the last ‘‘/’’ in str. If str does not contain a ‘‘/’’, a pointer to the original string is returned. 


char *strend(str) 
Strend() returns a pointer to the end (null byte) of the string svr. 


sudstr(str, result, origin, len) 
Copies at most /en characters from the string sir starting at srrforigin/ to the string pointed to by result. 
Sufficient space must exist for that string; resu/t is returned. There is no checking for the reasonableness 
of the arguments. The copying of sir to result stops if either the specified number (i.e., /en, which is 
taken as an unsigned integer) characters have been copied, or if the end of ser (i.e., a null byte) is 
found. A large value of /en (e.g., —1) will usually cause all of str to be copied. 


char “trnsiat(str, old, new, result) 
Copies string str to string resu/t replacing any character found in string o/d with the corresponding char- 
acter from String new; result is returned. 


verify(strl, str2) 
if string str] contains any characters not in string sir2, returns the offset of the first such character in 
sirl; else returns —1. 


char *zero(ptr, cnt) 
Sets to zero the area of memory cnt bytes long, starting at address ptr; pir is returned. 


char *zeropad(str) 
Replace initial blanks with ‘‘0’’ characters in string str; str is returned. 


2.2 Error Set 


The error set of subroutines consists of a general-purpose error handling routine called /ata/(), and 
general-purpose signal-setting and signal-catching routines called setsig() and setsig/(), respectively. 
There are also two additional routines called clean_up() and userexit(), which may be called by /ata/() or 
setsigiQ). Default versions of these two additional routines are supplied in the library. Users may 
define their own clean_up( and userexit() routines. 


The public include file ‘‘fatal.h’’ contains definitions needed to use /atai(). It contains the following: 


extern int Fflags; 
extern char “*Ffile; 
extern int Fvalue; . 
extern int (*Ffunc) (); 
extern int Fjmp([3]; 


# define FTLMSG 0100000 
# define FTLCLN 040000 
# define FTLFUNC 020000 


# define FTLACT 077 
# define FTLJMP 02 
# define FTLEXIT 01 
# define FTLRET 0 


# define FSAVE(val) SAVE(Fflags,old_Fflags); Fflags = val; 
# define FRSTRQ RSTR(Fflags,oid_Fflags); 


fatal(msg) 
A general-purpose error handler. Typically, low-level subroutines that detect error conditions (an open 
or create routine, for example) return as a value a call of faiai() with an appropriate message string. 
For example: 


return(fatal("can’t do it")); 


Higher-level routines control the execution of /aia/() via the global word Fflags. The macros FSAVE() 
and FRSTR() in ‘“‘fatal.h’’ can be used by higher-level subroutines to save and restore the Fflags word. 


The argument to /ata/() is a pointer to a message string. The action of /atai() is driven completely 
from the Fflags global integer, which is interpreted as explained below. 


The FTLMSG bit controls the writing of the message on file descriptor 2. The message is preceded by 
the string ‘ERROR: ’’, unless the global character pointer Ffile is non-zero, in which case the message 
is preceded by a string equivalent to: 


§ == sprintf(space, "ERROR [%s]: ", Ffile); 
A new-line character is written after the user-supplied message. 
If the FTLCLN bit is on, clean_up() is called with an argument of 0 (see below). 


If the FTLFUNC bit is on, the function pointed to by the global function pointer Ffunc is called with 
the user-supplied message pointer as an argument. This feature can be used to log these messages. 


The FILACT bits determine how /ata/() should return. If the FTLJMP bit is one, /ongjmp(Fimp) (see 
setimp(III)) is called. If the FTLEXIT bit is one the value of userexit(l) is passed as an argument to 
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exit(II) (see below). If none of the FTLACT bits is on (the default value for Fflags is 0), the global 
word Fvaiue (initialized to —1) is returned. 


If all fatal() globals have their default values, fata/Q) simply returns —1. 


setsig0) 
General-purpose signal-setting routine. All signals not already ignored or caught are made to be caught 
by the signal catching routine sersig/ (. 


setsigl () 
General-purpose signal catcher and termination routine. If a signal other than hangup, interrupt, or 
quit is caught, a ‘‘user-oriented’’ Ae/p(I) message is printed on file descriptor 2. If hangup, interrupt, 
or quit is caught, subsequent occurrences of that signal will be ignored. Termination is similar to the 
FTLCLN and FTLEXIT options of /aza/Q), in that clean_up(sig) (where sig is the signal number) and 
exit(userexit(1)) are called. 


If the file ‘“‘dump.core’’ exists in the current directory, the IOT signal is set to 0 and adorr(III) is called 
to produce a core dump (after calling clean_up(), but before calling userexit()). 


clean_up() 
A default clean_up() routine is provided to resolve external references. It simply returns. User- 
supplied clean_up() routines are often used for removing temporary files, etc. 


userexit(code) 
A default userexit() routine is provided to resolve external references. It returns the value of code. 
User-supplied userexit() routines are often used for logging usage statistics. 


2.3 Sys Set 


The sys set of subroutines provides interfaces to system calls that process error conditions and call 
fatai(). In addition, a few functions which are not available elsewhere are provided. 


curdir(path) 
Places the complete pathname of the current directory in string path. Returns 0 on success, non-zero 
on failure. On successful return, the current directory is the same as it was on entry; on failure return, 
the current directory is not known. 


Jdfopen(fa, mode) 
This subroutine provides a file-descriptor interface to the routines in 4 New Input-Output Package, and 
is required when one wants to use the routines in A New Input-Output Package with pipes. The first 
argument is a file descriptor (from open(II), crear(II), or pipe(II)), the second is the read/write mode 
(0/1, respectively). A file pointer (see A New Input-Output Package) is returned on success, and NULL 
on failure (typically, because there are no file structures available). 


giveup(dump) 
This routine does the following: 


Change directory to ‘‘/”’ if argument is 0. 
Set IOT signal to syste default (0). 
Call abort(Il). 


Thus, if giveup( is called with a 0 argument; and the file ‘‘/core’’ is not writable (or if the file ‘‘/core’’ 
doesn’t exist, and the directory ‘‘/’’ is not writable), no core dump will be produced. 


lockit(lockfile, count, pid) 

A process semaphore implemented with files; typically, used to establish exclusive use of a resource 
(usually a file). The file’s name is /ockfile. Lockit() tries count times to create lockfile mode 444. It 
sleeps 10 seconds between tries. If lock/file is created, the number pid (typically, the process ID of the 
current process) is written (in binary; i.e., as two bytes) into lock/file, and 0 is returned. If /ock/file exists 
and hasn’t been modified within the last 60 seconds, and if it either is empty or if its first two bytes, 
interpreted as a binary number, are not the process ID of any existing process, /ockfile is removed and 
lockitQ) tries again to make Jockfile. After count tries, or if the reason for the creation of lock/ile failing 
is something other than EACCES (see /NTRO(II)), lockitQ returns —1. See also unlockitQ, below. 


rename(oldname, newname) 
Renames oldname to be newname; it can be thought of as: 


mv oldname newname 
It calls xiinkQ and xunlinkQ (see below). 


unlockit(lockfile, pid) 
UnlockitO is meant to be used to remove a lockfile created by lockit(). It verifies that the pid specified is 
contained in the first two bytes of the named Jock/ile, and then removes it. If the pids match, and the 
file is successfully removed, smock ) returns 0; otherwise, —1 is returned. 


userdir(uid) 
Returns user’s login directory name. The argument must be an integer user ID. There is an assump- 
tion that the directory field is the fifth field of a password file entry (i.e., there is no ‘‘group id’’ in the 
password file). Returns a pointer to the login directory on success, 0 on failure. It remembers its argu- 
ment and the returned login directory name for subsequent calls to speed itself up. Users of PWB/UNIX 
systems should use /ogdir() (see loginfo(II)). 


username(uid) 
Returns user’s login name. The argument must be an integer user ID. Returns a pointer to the login 
name on success, a pointer to the string representation of the user ID on failure. There is an assump- 
tion that the login name field is the first field of a password file entry. It remembers its argument and 
the returned login name for subsequent calls to speed itseif up. Users of PWB/UNIX systems should use 
logname() (see loginfo(Il)). 


xalloc(size), xfree(ptr), xfreeall0) 

Xalloc() and xfree() are used in the same way as ailloc(III) and /ree(III). The function -reeall() frees 
all memory allocated by xalloc() (it calls brk(Il)). Xalloc() returns the address of the allocated area on 
success, and the value of faza/() on failure. X/ree() and x/reeall() don’t return anything. Xalloc() uses 
a ‘first fit’? strategy (unlike alloc(II)). Xfree() always coalesces contiguous free blocks. Xalloc() always 
allocates 2-byte words. Xalloc() actually allocates one more word than the amount requested. The 
extra word (the first word of the allocated block) contains the size (in bytes) of the entire block. This 
size is used by xfree() to identify contiguous blocks, and is used by xalloc() to implement the first fit 
strategy. Bad things will happen if that first (size) word is overwritten. Worse things happen if xjree() 
is called with a garbage argument. 


xcreat(name, mode) 
Xcreat() is used in the same way as crear({II). Xcrear() requires write permission in the pertinent direc- 
tory in ail cases, and the created file is guaranteed to have the specified mode and be owned by the 
effective user (xcreat() does this by first unlinking the file to be created); xcrear() returns a file descrip- 
tor on success, and the value of /azra/() on failure. 


xfcreat(file, mode) 
Xfereat() is a macro that combines xcrear() and fdfopen(); its definition is: 


fdfopen(xcreat(file, mode) ,1) 


xfopen(file, mode) 
Xfopen() is a macro that combines xopen() (see below) and /dfopen(); its definition is: 


fdfopen(xopen(file, mode), mode) 
xlink(f1, f2) 


XlinkO is used in the same way as link. It is an interface to /ink(II) that handles all error condi- 
tions. It returns 0 on success, and the value of fata/() on failure. 


xmsg(file, fancname) 
Xmsg() is used by the other x-routines to generate an error message based on errno (see INTRO(ID). 
It calls fatal() with the appropriate error message. The second argument is a pointer to the calling 
function’s name (a string). There are predefined messages for the most common errors. Other errors 
cause a message of the form: 
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str = sprintf(space, “error = %d, function = ‘%s’", errno, funcname) 
to be passed to fatal(). 


xopen(name, mode) 
Xopen() is used in the same way as open(II). It is an interface to open(II) that handles all error condi- 
tions. It returns a file descriptor on success, and the value of fata/() on failure. 


xpipe() 
Xpipe() is used in the same way as pipe(II). It is an interface to pipe(II) that handles all error condi- 
tions. It returns 0 on success, and the value of fara/() on failure. 


xunlink(p) 
Xunlink( is used in the same way as unlink(II). It is an interface to uwnlink(II) that handles all error 
conditions. It returns 0 on success, and the value of fara/() on failure. 


2.4 Write Set 


write(fildes, buffer, nbytes) 
Write() is used in the same way as write(II). It is an interface to syswrite() (see below) that handles all 
error conditions. It returns the number of bytes written on success, and the value of /ata/() on failure. 


syswrite(fildes, buffer, nbytes) 
Syswrite() is identical to write(II), except that the name write has been changed to syswrite. 
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1. PREFACE 


A set of background processes supports remote job entry (RJE) from a PWB/UNIX* computer to IBM Sys- 
tem/360 and /370 host computers. ‘‘Hasp’’ is the common name used for the collection of programs 
and for the file organization that provides this facility; it allows PWB/UNIX to communicate with IBM’s 
Job Entry Subsystem by mimicking an IBM 2770 remote station. The Pwayunix User's Manual page 
hasp(VIII) summarizes their design and operating procedures. That manual also contains a terse 
description of the sena{I) command, which is the user’s primary interface to RJE.! These are the 
definitive sources for information about RJE. Although the word ‘‘Hasp’’ may be used in this guide, it 
represents all IBM RJE subsystems of the PWB/UNIX System. 


This guide is a tutorial overview of RJE.* It is addressed to the user who needs to know how to use the 
system, but does ot need to know details of its implementation. The two following sections constitute 
an introduction to RJE. : 


2. PRELIMINARIES 


To become a PWB/UNIX user, you must receive a login name that identifies you to the PWB/UNIX system. 
You should also get a copy of the PwaiuNixX User’s Manual. This is a fairly complete description of the 
system and includes a section entitled ‘‘How to Get Started,’ which introduces you to PWB/UNIX; you 
should read that section before proceeding with this guide. 


In order to begin using RJE, you need only become familiar with a subset of basic commands. You 
must understand the directory structure of the file system, and you should know something about the 
attributes of files: see chdir{I), chmod(I), chown(]), cp(1), In(D, Is(), mkdir), mv(D, rm(). You 
must know how to enter, edit, and examine text files: see car(I), ed({1), pr(I). You should know how 
to communicate with other users and with the system: see mail({I), mesg(I), whol), write(I. And, 
finally, you might have to know how to describe your terminal to the system: see ascii(V), stry(1), 
tabs({I). 


3. BASIC RJE 


Let’s suppose that you have used the editor, ed(I), to create a file ‘‘jobfile’’ that contains your control 
statements (JCL) and input data. This file should look exactly like a card deck, except that alphabetic 
characters, for convenience, may be in either upper or lower case. Here is an example: 


% cat jobfile 
//gener job (9999,r740) ,pgmrname,class=x usr=(mylogin,myplace) 
//step exec pgm=iebgener 
//sysprint dd sysout=a 
//sysin dd dummy 
//sysut2 dd sysout=a 
//sysutl dd = 
first card of data 


last card of data 
/s 


* UNIX is a Trademark of Bell Laboratories. 

i. In this paper, RJE refers to the PWB/UNIX facilities provided by Aasp(VIII), and nor to the Remote Job Entry feature of 
IBM’s HASP or JES2 subsystems. 

2. The original versions of this manual and of RJE itself were written by T. G. Lyons. 
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To submit this job for execution, you must invoke the send(I) command: 
% send jobfile 
The system will reply: 


10 cards. 
Queued as /usr/hasp/xmit311. 


Note that send tells you how many cards it submitted and reports the position that your job has been 
assigned in the queue of ail jobs: waiting to be transmitted to the host system. Until the transmission of 
the job actually begins, you can prevent the job from being transmitted by doing a “‘chmod 0”’ on the 
queued file to make it unreadable. For our example, you could say: ““chmod 0 /usr/hasp/xmit311”’. 


When your job is accepted by the host system, a job number will be assigned to it, and an acknowledge- 
ment message will be generated. This indicates that your job has been scheduled on the host system. 
Later, after the job has executed, its output will be returned to the PWB/UNIX system. You will be 
notified automatically of both of these events: if you are logged in when RJE detects these events, and if 
you are permitting messages to be sent to your terminal (see mesg(I)), the following two messages will 
be sent to you (still using the example abowe) when the job is scheduled and when the output is 
returned, respectively: 


Two bells 
$12.18.42 JOB 384 ON RM4.RDI -- GENER PGMRNAME 
Bell 


Two bells 
12:21:54 /al/user/rje/prnt0 384.gener ready 
Bell 


The job-acknowledgement message is passed on directly from the host system, as indicated by the fact 
that it appears in upper case. The output-ready message is generated by RJE and appears in lower case. 
Two beils, with an interval of one second between them, precede each message. They shouid be inter- 
preted as a warning to stop typing on your terminal, so that the imminent message is not interspersed 
with your typing. 


If you are not logged in when one of these events occurs, or if you do not allow messages to be sent to 
your terminal, then the notification will be posted to you via the mai/(I) command. You can prevent 
messages directly by executing the mesg(I) command, or indirectly by executing another command, 
such as pr(I), which prohibits messages for as long as it is active. You may inspect (by invoking the 
mailcommand) your .mail file at any time for messages that have been diverted. For this example, this 
might look as follows: 


% mail 


From rje Mon Aug 1 12:20:36 1977 
$12.18.42 JOB 384 ON RM4.RDI -- GENER PGMRNAME 


From rie Mon Aug 1 12:21:55 1977 ; 
12:21:54 /al/user/rje/prntO 384.gener ready 


Save? 
Note that there may exist a discrepancy between the host and PWB/UNIX clocks. 


The job-acknowledgement message performs two functions. First, it confirms the fact that your job has 
been scheduled for eventual execution. Second, it assigns a number to the job in such a way that the 
number and the name together will uniquely identify the job for some period of time. . 


The output-ready message provides the name of a PwB/UNIX file into which output has been written and 
identifies the job to which the output belongs (see J/s(I)): 


% Is -1 prntO 
-r--r-xr-- 1 rje 1184 Aug 1 12:21 prntd 


as 


Note that rje retains ownership of the output and allows you only read access to it. It is intended that 
you will inspect the file, perhaps extract some information from it and then promptly delete it (see 
rm({I)): 


% rm -f prnt0 


The retention of machine-generated files, such as RJE output, is discouraged. It is your responsibility to 
remove files from your RJE directory. Files of RJE output may not exceed 256K bytes. In addition, 
only files of 64K bytes are guaranteed to be accepted in their entirety. Limits of 64K, 128K, or 192K 
may be automatically enforced if file space gets scarce. Output beyond the current limit will be dis- 
carded, with no provision for retrieving it. The user should also be aware of the fact that RJE attempts 
to keep roughly 1000 ‘‘blocks’’ free on any file system it uses. Warning messages or suspension of cer- 
tain functions will occur as this limit is approached. 


The most elementary way to examine your output is to cat it to your terminal. The Appendix shows 
the result of listing the output of our sample job in this way. Printouts are stored with standard tabs to 
conserve space, sO you must ensure that the tabs are set on your terminal at every eighth column 
across the entire line; tads(I) will do that for you.2 Because PWB/UNIX has no high-volume printing 
capability, you should route to the host’s printer any large listings of which you desire a hard copy. 


The structure of an output listing will generally conform to the following sequence: 


HASP log 
jcl information 
data sets 
HASP end 


‘“*Burst’” pages are discarded. Single, double, and triple spacing is reflected in the output file, but other 
forms controls, such as the skip to the top of a new page, are suppressed. Page boundaries are indi- 
cated by the presence of a space character at the end of the last line of each page. 


The big file scanner 5/fs{I) or the context editor ed(I) provide a more flexible method than car{I) for 
examining printed output; 6/s can handle files of any size and is more efficient than ed for scanning 
files. 


RJE is also capable of receiving punched output as formatted files (see edcdic(V)); this format allows an 
exact representation of an arbitrary card deck to be stored on the PWB/UNIX machine. However, there 
are few commands that can be used to manipulate EBCDIC files. You will probably want to route your 
punched output to one of the host’s output devices. 


4. SEND COMMAND 


The send(I) command is capable of more general processing than has been indicated in the previous 
section. In the first place, it will concatenate a sequence of files to create a single job stream. This 
allows files of JcL and files of data to be maintained separately on the PWB/UNIX machine. In addition, 
it recognizes any line of an input file that begins with the character ‘‘””’ as being a control line that can 
call for the inclusion, inside the current file, of some other file. This allows you to “‘send’’ a top-level 
skeleton that ‘‘pulls’’ in subordinate files as needed. Some of these may be ‘“‘virtual’’ files that actually 
consist of the output of PWB/UNIX commands or Shell procedures. Furthermore, the send command ts 
able to collect input directly from a terminal, and can be instructed to prompt for required information. 


Each source of input can contain a format specification that determines such things as how to expand 
tabs and how long can an input line be. The manual page for /spec(V) explains how to define such for- 
mats. When properly instructed, send will also replace arbitrarily defined ‘“‘keywords’’ by other text 
strings or by EBCDIC character codes. (These two substitution facilities are useful in other applications 
besides RJE; for that reason, send may be invoked under the name garh to produce standard output 
without Submitting an RJE job.) 


3. If your termina! doesn’t have tabs, you should use the szty([) command to cause PWB/UNIX to automatically convert tabs to 
Spaces on Output to the terminal. 


a 


Two aspects of send with which everyone should be acquainted are the ability to specify to which com- 
puter a job is to be submitted, and the ability to verify a job prior to submission. To run our sample 
job on a host machine known to RJE as “‘A’’, we would issue the command: 


% send A jobfile 
When no host is explicitly cited, send makes a reasonable choice. 
To verify the text of a collected job stream, without actually submitting it, set the ‘‘-lq’’ flags: 
% send -lq jobfile 
The complete list of arguments and flags that control the execution of send can be found in send{(I). 


5. JOB STREAM 


It is assumed that the job stream submitted as the result of a single execution of send consists of a sin- 
gle job, i.e., the file that is queued for transmission should contain one JoB card near the beginning and 
no others. A priority control card may legitimately precede the JoB card. The Jos card must conform to 
the local installation’s standard. At BISP, it has the following structure: 


//name job (acct[,...]),pgmrname[,keywds==?} [usr=.. .] 


6. USER SPECIFICATION 
The ‘‘usr=...”” field is required if any print or punch output is to be delivered to the PwWB/UNIX user. 
usr=(login, place(, [level] [,retry]}]) 


where /ogin is the PWB/UNIX login name of the user, leve/ is the desired level of notification (see end of 
this section for an explanation), retry is a one-character code specifying the number of attempts to 
retransmit an entire job if the transmission to the host computer is interrupted by an unrecoverable 
error (default is three attempts; the digits “‘1”’ through ‘‘9’” specify that number of retries; ‘‘0’’, ‘‘y’’, 
or ‘‘Y’’ invoke the default; any other entry in this field limits the number of attempts to one, i.e., no 
retry), and piace is as follows: 


A. If place is the name of a directory (writable by others), then the output file is placed there as a 
unique prnt or pnch file (up to 500 of each allowed). The mode of the file will be 454. 


B. If place is the name of an existing, writable (by others), non-executable (by others) file, then the 
output file replaces it. The mode of the file will be 454. 


C. If place is the name of a non-existent file in a writable (by others) directory, then the output file is 
placed there. The mode of the file will be 454. 


D. If place is the name of an executable (by others) file, then the RJE output is set up as standard 
input to place, and place is executed. Five string arguments are passed to piace. For exampie, if 
place is a shell procedure, the following arguments are passed as $1... $5: 


|. Flag indicating whether file space is scarce in the file system where place resides. 0 indicates 
that space is not scarce, while | indicates that it is. 

. Job name. 

Programmer’s name. 

Job number. 

Login name from the “‘usr=...’” specification. 


UB we ty 


A ‘‘:” is passed if a value is not present. 
&. In all other cases, the output will be thrown away. 


The place value must not be a full pathname, unless it refers to an executable file (see D above). For 
cases A, B, and C above (and case D, if a full pathname is not supplied), the name of the user’s login 
directory will be used to form a full pathname. 
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The ‘‘usr=...”” field may occur anywhere within the first 100 card images sent and within the first 200 
output images received by the PWB/UNIX system. The only restrictions are: 


e Column one must contain a ‘‘/”’ or a ‘‘#””, 
e ‘‘usr=...’’ must begin after column 4 and must be preceded by a space. 
Therefore, the “‘usr=...°’ field may be placed on the joB card, a comment card, passed as data, etc. 


For redirection of output by the host, a “‘usr==...’’ card, if not already present, must be supplied by the 
user. This can be done by placing a job step that creates this card before your output steps. 


Messages generated by RJE or passed on from the host are assigned a level of importance ranging from 
1 to 9. The levels currently in use are: 


3 transmittal assurance 
5 job acknowledgement 
6 output ready message 
7 transmit format error 


The optional ‘“‘level’’ field of the “‘usr=...’’ specification must be a one- or two-digit code. A message 
from the host with importance ‘‘x’’ (where x comes from the above list) is compared with each of the 
two decimal digits “‘mw’’. If x2w and if the user is logged in and is accepting messages, the message 
will be written to his or her terminal. Otherwise, if x2m, the message will be mailed to the user. In 
all other cases, the message will be discarded. The default “‘level field’’ is ‘‘S4’’. You should specify 
level ‘1°’ if you want to receive complete notification, and level °‘59’’ to divert the last three messages 
in the above list to your mailbox. 


7. CONTROL CARDS 


A number of control cards are recognized by the host’s HASP subsystem. Two are of particular interest 
to RJE users, because they control the disposition of output. These are the ROUTE and OUTPUT cards, 
and their use is illustrated below. If used, both should be inserted into a job stream immediately after 
the JOB card. . 


The ROUTE card can be used to direct the entire printed or punched output of a job to a specified desti- 
nation. Two cards are required to direct both outputs: 


/*coute punch local 
/*route print rmt55 


The ROUTE card has a fixed format. ‘‘Print’’ or ‘‘punch’’ must begin in column 10, and the destination 
field in column 16. 


The proper use of the OUTPUT card is a bit more complicated. It allows you to associate parameters 
with all SySOUT data sets whose forms numbers match the one specified on the OUTPUT card. The 
forms number is fictitious and may consist of up to four characters. A copy count and destination are 
among the parameters that may be associated with SYSOUT data sets in this way: 


//name job... 

/soutput py d=rmt56 
/*output abcd d=tlocal,n=2 
//step exec ... 

//prtl dd sysout=a 

//prt2 dd sysout=(a,,abed) 
//prt3 dd sysout=(a,,py) 
//pnch dd sysout=(b,,abcd) 


In the above example, one copy of prti would be directed to the default destination and one copy of 
prt? to rmt36. Two copies each would be made of prt2 and pnch, and they would remain at the /oca/ 
site. 


8. MONITORING RJE 


RJE is designed to be an autonomous facility that does not require manual supervision. RJE is initiated 
by the PWB/UNIX operator after system ‘‘reboots’’ and continues in execution indefinitely. Experience 
has proved it to be reasonably robust, although it is vulnerable to system crashes and reconfigurations. 


Users have a right to assume that, if the pwB/UNIX system is up for production use, RJE should also be 
up. This implies more than an ability to execute the send(I) command, which should be available at all 
times. [t means that queued jobs should be submitted to the host for execution and their output 
returned to the PWB/UNIX system, If a user cannot obtain any throughput from RJE, the user should so 
advise the PWB/UNIX operators. 


The rjestax{1) command, invoked without the ‘‘-’’ argument, will report the status of all RJE links for 
which a given PWB/UNIX system is configured. It may sometimes also print a message of the day from 
RJE. 


% rjestat 
15:12:24 RJE to B is operating normally. 


15:12:25 A is not responding to RJE. 
(8 files queued since 14:34:26) 


A parenthetical statement, such as the last line above, will summarize any backlog of queued files wait- 
ing to be transmitted to the host machine. A backlog that persists for 20 minutes or more often is an 
indication that there exists a problem with the corresponding RJE link. 


A host machine may be reported to be not responding to RJE because it is down, or because of its 
operator’s failure to initialize the associated line, or because of a communications hardware failure. 


ss 


Appendix—Sample Output Listing 


% cat rje/prntd 
14.40.31 JOB 384 SHASP373 GENER STARTED - INIT 26 - CLASS X - SYS RRMA 
14.40.32 JOB 384 SHASP39S5 GENER ENDED 


{ AUG 77 JOB EXECUTION DATE 


54 CARDS READ 


76 SYSOUT PRINT RECORDS 


0 SYSOUT PUNCH RECORDS 


0.01 MINUTES EXECUTION TIME 
I //GENER JOB (9999,R740),PGMRNAME,CLASS=X JOB 384 
ee USR=(MYLOGIN,MYPLACE) 
//IEBGENER EXEC PGM=IEBGENER 
//SYSPRINT DD DUMMY 
//SYSIN DD DUMMY 
//SYSUT2 DD SYSOUT=A : * 
//SYSUTI DD + 
H/ 
IEF236i ALLOC. FOR GENER IEBGENER 
IEF2371 DMY ALLOCATED TO SYSPRINT 
IEF2371 DMY ALLOCATED TO SYSIN 
[EF2371 JES ALLOCATED TO SYSUT2 
[EF2371 JES ALLOCATED TO SYSUT! 
[EF1421 GENER IEBGENER - STEP WAS EXECUTED - COND CODE 0000 
IEF2851 JES2.JOB0384.S00102 SYSOUT 
IEF28S1 JES2.JOB0384.S10101 SYSIN 
1EF3731 STEP /IEBGENER/ START 77242.1440 
IEF3741 STEP /IEBGENER/ STOP 77242.1440 CPU OMIN 00.13SEC SRB OMIN 00.0ISEC VIRT 36K SYS 188K 


RW” & WG bv 


sesewes SERVICE UNITS=0000174 SERVICE RATE=0000268 SERVICE UNITS/SECOND 
seseese PERFORMANCE GROUP=005 

sseseee EXCP COUNT BY UNIT ADDRESS 

IEF3751 JOB /GENER / START 77242.1440 

[IEF3761 JOB /GENER / STOP 77242.1440 CPU OMIN 00.i3SEC SRB OMIN 00.0ISEC 


eseeess SERVICE UNITS=0000174 SERVICE RATE=0000268 SERVICE UNITS/SECOND 
seseeee APPROXIMATE PROCESSING TIME= 01 MINUTES 

seseees EXCPS=000000000 

eseseee PROJECTED CHARGES= OL 


first line of data 


last line of data 


*OS/VS2 REL 3.7 JES2e) END JOBNAME=GENER BIN=R740 JOB #=384 PGMRNAME 
*OS/VS2 REL 3.7 JES2e END JOBNAME=GENER  BIN=R740 JOB #384 PGMRNAME 
*OS/VS2 REL 3.7 JES2s END JOBNAME=GENER  BIN=R740 JOB #=384 PGMRNAME 


% rm -f rje/prntd 
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=a ADDENDUM #1 
UNIVAC RJE 


{. PREFACE j1.] * 


“™ A set of Pwe/uNIX f background processes supports remote job entry (RJE) from a PWB/UNIX computer 
to UNIVAC 1100-series host computers. ‘‘Uvac’’ is the common name used for the collection of pro- 
grams and for the file organization that provides this facility. 


ll. BASIC RJE {3.} 


Alphabetic characters, for convenience, may be in either upper or lower case. The master space ‘*@”’ 
may be represented as ‘‘*’’. Here is an example: 


- % cat jobfile 
‘run echo,acct-no,project-id §. usr== (mylogin,myplace) 
‘ed,i _—.elt 
first card of data 


last card of data 
*fin 


The system will reply: 


8 cards. 
Queued as /usr/uvac/xmit311. | : 


__ The job acknowledgement messages are: 


Two bells 
$12.18.42 001,.ECHO STARTED ACCT-NO 
Bell 


Two bells 
12:21:54 /al/user/rje/prntO .echo ready 
Bell 


To route your printout to the host’s printer, insert the following control card into the run stream fol- 
lowing the RUN card: 


s 


sym printS,,pr 


‘*Burst’’ pages are wor discarded and the reception of punched output is of supported. 


aad 


it. JOB STREAM (5.; 
The RUN card must conform to the local installation’s standard. In general, it has the following format: 


@run  name,acct-no,project-id [ . usr=...] 


IV, USER SPECIFICATION (6.| 


The ‘‘usr=...°’ field may occur anywhere within the first 100 card images sent and within the first 200 
Oulput images rece:ved by the PWB/UNIX system. The only restrictions are: 


e Column | must contain a ‘‘@’’ ora ‘“**”’. 
e ‘“‘usr=...°° must begin after column 4 and must be preceded by a space. 


~ Therefore, the ‘‘usr=...°° field may be placed on the RUN card, a message card, passed as data, etc. 


Numbers enclosed in curly braces are section numbers of Gude to last Remote Job Entry for Pusiusix Users by A. L. 
Sabsevitz, September 1977. 
& 


r Ustr isa Trademark of Bell Laboratories. 


Edition 1.2 -l- February 1978 
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Vv. CONTROL CARDS {7.} 
This section of the guide is not applicable to UNIVAC RJE. 


VI. MONITORING RJE (8.} 
The interactive status terminal capability of the -jestar(I) command is zor implemented. 
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ABSTRACT 


The Source Code Control System (sccs) is a system for controlling changes to files of text 
(typically, the source code and documentation of software systems). It provides facilities for 
storing, updating, and retrieving any version of a file of text, for controlling updating 
privileges to that file, for identifying the version of a retrieved file, and for recording who 
made each change, when and where it was made, and why. SCCS is a collection of programs 
that run under the PwB/UNIX”* time-sharing system. 


This document, together with the Pwayunix User’s Manual [4], is a complete user’s guide to 
Version 4 of sccs, and supersedes all previous versions of the SCCS/PWB manual; it covers 
the following topics: 


e How to get started with sccs. 
The version numbering scheme. 

e Basic information needed for day-to-day use of SCCs commands, including a discussion of 
the more useful arguments. 

e Protection and auditing of sccs files, including the differences between the use of Sccs 
by individual users on one hand, and groups of users on the other. 


Neither the implementation of sccs nor the installation procedure for SCCS are described 
here. 


1, INTRODUCTION 


’ The Source Code Control System (sccs) is a collection of PwB/UNIXx [1] commands that help individu- 
als or projects control and account for changes to files of text (typically, the source code and documen- 
tation of software systems). It is convenient to conceive of SCCS as a custodian of files; it allows 
retrieval of particular versions of the files, administers changes to them, controls updating privileges to 
them, and records who made each change, when and where it was made, and why. This is important in 
environments in which programs and documentation undergo frequent changes (because of mainte- 
nance and/or enhancement work), inasmuch as it is sometimes desirable to regenerate the version of a 
program or document as it was before changes were applied to it. Obviously, this could be done by 
keeping copies (on paper or other media), but this quickly becomes unmanageable and wasteful as the 
number of programs and documents increases. SCCS provides an attractive solution because it stores on 
disk the original file and, whenever changes are made to it, stores only the changes; each set of changes 
is called a ‘‘deita.”’ 


This document, together with the Pwa/uNnix User's Manual [4], is a complete user’s guide to Version 4 
of sccs. This manual contains the following sections: 


e Sccs for Beginners: How to make an sccs file, how to update it, and how to retrieve a version 
thereof. 
How Deltas Are Numbered: How versions of SCCs files are numbered and named. 
SccsS Command Conventions: Conventions and rules generally applicable to all sccs commands. 
Sccs Commands: Explanation of all Sccs commands, with discussions of the more useful arguments. 


* UNIX is a Trademark of Beil Laboratories. 
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e Sccs Files: Protection, format, and auditing of sccs files, including a discussion of the differences 
between using SCCS as an individual and using it as a member of a group or project. The role of a 
**project SCCS administrator’’ is introduced. 


2. SCCS FOR BEGINNERS 


lt is assumed that the reader knows how to log onto a PwB/UNIX System, create files, and use the text 
editor [2,3]. A number of terminal-session fragments are presented below. All of them should be 
tried: the best way to learn ScCs is to use it. 


To supplement the material in this manual, the detailed sccs command descriptions (appearing in 
alphabetical order in Section I of [4]) should be consulted. Section 5 below contains a list of all the 
sccs commands. For the time being, however, only basic concepts will be discussed. 


2.1 Terminology 


Each sccs file is composed of one or more sets of changes applied to the null (empty) version of the 
file, with each set of changes usually depending on ail previous sets. Each set of changes is called a 
‘‘delta’’ and is assigned a name, called the Sccs /Dentification string (SID), composed of at most four 
components, only the first two of which will concern us for now; these are the “‘release’’ and ‘‘level’’ 
numbers, separated by a period. Hence, the first delta is called °‘1.1°’, the second ‘°1.2”’, the third 
‘*1,3”’, etc. The release number can also be changed (usually, this indicates a major change to the file) 
as discussed below. 


Each delta of an sccs file defines a particular version of the file. For example, deita 1.5 defines version 
1.5 of the sccs file, obtained by applying to the null (empty) version of the file the changes that consti- 
tute deltas 1.1, 1.2, etc., up to and including delta 1.5 itself, in that order. 


2.2 Creating an SCCS File—The ‘‘admin’’ Command 
Consider, for example, a file called ‘‘lang’’ that contains a list of programming tanguages: 


Cc 

pl/i 
fortran 
cobol 
algol 


We wish to give custody of this file to sccs. The following admin command (which is used to adminis- 
terSCCS files) creates an Sccs file and initializes delta 1.1 from the file “‘lang”’: 


admin —ilang s.lang 


All sccs files must have names that begin with ‘‘s.’’, hence, ‘“‘s.lang’’. The —i keyletter, together with 
its value “‘lang’’, indicates that admin is to create a new SCCS file and initialize it with the contents of the 
file “‘lang’’. This initial version is a set of changes applied to the null sccs file; it is deita 1.1. 


The admin command replies: 
No id keywords (cm7) 


This is a warning message (which may also be issued by other sccs commands) that is to be ignored for 
the purposes of this section. Its significance is described in Section 5.1 below. 


The file “‘lang’’ should be removed (because it can be easily reconstructed by using the ger command, 
below): 


rm lang 
2.3 Retrieving a File—The ‘‘get’’ Command 
The command: 

get s.lang 


causes the creation (retrieval) of the latest version of file ‘‘s.lang’’, and prints the following messages: 


1.1 
5 lines 
No id keywords (cm7) 


This means that ger retrieved version 1.1 of the file, which is made up of 5 lines of text. The retrieved 
text is placed in a file whose name is formed by deleting the ‘‘s.”’ prefix from the name of the sccs file: 
hence, the file “‘lang’’ is created. 


The above ger command simply creates the file “‘lang’’ read-only, and keeps no information whatsoever 
regarding its creation. On the other hand, in order to be able to subsequently apply changes to an sccs 
file with the de/ta command (see below), the ger command must be informed of your intention to do 
so. This is done as follows: 


get —e s.lang 


The —e keyletter causes get to create a file “‘lang’’ for both reading and writing (so that it may be 
edited) and places certain information about the sccs file in another new file, called the p-fle, that will 
be read by the de/ta command. The ger command prints the same messages as before, except that the 
warning message is not issued. 


The file ‘“‘lang’’ may now be changed, for example, by: 


2.4 Recording Changes—The ‘‘delta’’ Command 

In order to record within the sccs file the changes that have been applied to “‘lang’’, execute: 
delta s.lang 

Delta prompts with: 
comments? 

the response to which should be a description of why the changes were made; for example: 
comments? added more languages 


Delta then reads the p-file, and determines what changes were made to the file ‘‘lang’’. It does this by 
doing its own ger to retrieve the original version, and by applying diff(I)! to the original version and the 
edited version. 


When this process is complete, at which point the changes to ‘‘lang’’ have been stored in ‘‘s.lang’’, 
delta outputs: 


No id keywords (cm7) 
1.2 

2 inserted 

0 deleted 

5 unchanged 


The number **].2”’ is the name of the delta just created, and the next three lines of output refer to the 
number of lines in the file *‘s.lang’’. 


1. Alt references of the form name(N) refer to item name in section N of the PWB/UNIX User’s Manual {4}. 


2.5 More about the ‘‘get’? Command 
As we have seen: 
get s.lang 


retrieves the latest version (now 1.2) of the file ‘‘s.lang’’. This is done by starting with the original 
version of the file and successively applying deltas (the changes) in order, until all have been applied. 


For our example, the following commands are ail equivalent: 
get s.lang 
get —rl s.lang 
get ~—rl.2 s.lang 


The numbers following the —r keyletter are SIDs (see Section 2.1 above). Note that omitting the level 
number of the sID (as in the second example above) is equivalent to specifying the Aighesr level number 
that exists within the specified release. Thus, the second command requests the retrieval of the latest 
version in release |, namely 1.2. The third command specifically requests the retrieval of a particular 
version, in this case, also 1.2. 


Whenever a truly major change is made to a file, the significance of that change is usually indicated by 
changing the release number (first component of the sID) of the delta being made. Since normal, 
automatic, numbering of deltas proceeds by incrementing the level number (second component of the 
SID), we must indicate to sccs that we wish to change the release number. This is done with the ger 
command: 


get —e —r2 s.lang 


Because release 2 does not exist, ger retrieves the latest version before release 2; it also interprets this as 
a request to change the release number of the delta we wish to create to 2, thereby causing it to be 
named 2.1, rather than 1.3. This information is conveyed to delta via the p-file. Ger then outputs: 


1.2 
7 lines 


indicating the retrieval of version 1.2. If the file is now edited, for example, by: 


ed lang 
4] 
/cobol/d 


Ww 
35 
q 
and delta executed: 


deita s.lang 
comments? deleted cobol from list of languages 


we will see, by de/ra’s output, that version 2.1 is indeed created: 


No id keywords (cm7) 
2.1 

0 inserted 

| deleted 

6 unchanged 


Deltas may now be created in release 2 (deltas 2.2, 2.3, etc.), or another new release may be created in 
a similar manner. This process may be continued as desired. 


2.6 The “‘help’’ command 

If the command: 
get abc 

is executed, the following message will be output: 
ERROR [abc]: not an SCCS file (col) 


The string “‘col’’ is a code for the diagnostic message, and may be used to obtain a fuller explanation 
of that message by use of the Ae/p command: 


heip col 
This produces the following output: 


col: 

“not an SCCS file” 

A file that you think is an SCCS file 
does not begin with the characters "s.". 


Thus, help is a useful command to use whenever there is any doubt about the meaning of an SCCS mes- 
sage. Fuller explanations of almost all ScCS messages may be found in this manner. 


3. HOW DELTAS ARE NUMBERED 


It is convenient to conceive of the deltas applied to an sccs file as the nodes of a tree, in which the root 
is the initial version of the file. The root delta (node) is normally named ‘‘1.1°’ and successor deitas 
(nodes) are named ‘‘1.2’’, ‘‘1.3”, etc. The components of the names of the deltas are called the 
‘“*release’” and the ‘“‘level’’ numbers, respectively. Thus, normal naming of successor deltas proceeds 
by incrementing the level number, which is performed automatically by sccS whenever a delta is made. 
In addition, the user may wish to change the release number when making a delta, to indicate that a 
major change is being made. When this is done, the release number also applies to all successor deltas, 
unless specifically changed again. Thus, the evolution of a particular file may be represented as in 
Figure |. 


Release 1 Release 2 


Figure 1. Evolution of an Sccs File 


Such a structure may be termed the “trunk’’ of the sccs tree. It represents the normal sequential 
development of an sccs file, in which changes that are part of any given delta are dependent upon ail 
the preceding deltas. 


However, there are situations in which it is necessary to cause a branching in the tree, in that changes 
applied as part of a given delta are nor dependent upon all previous deitas. As an example, consider a 
proggam which is in production use at version 1.3, and for which development work on release 2 is 
already in progress. Thus, release 2 may already have some deltas, precisely as shown in Figure 1. 
Assume that a production user reports a problem in version 1.3, and that the nature of the problem is 
such that it cannot wait to be repaired in release 2. The changes necessary to repair the trouble will be 
applied as a delta to version 1.3 (the version in production use). This creates a new version that will 
then be released to the user, but will nor affect the changes being applied for release 2 (i.e., deltas 1.4, 
2.1, 2.2, ete.). 


The new delta is a node on a “‘branch”’ of the tree, and its name consists of four components, namely, 
the release and level numbers, as with trunk deltas, plus the ‘‘branch’’ and ‘‘sequence’’ numbers, as 
follows: 


release.level.branch.sequence 
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The branch number is assigned to each branch that is a descendant of a particular trunk delta, with the 
first such branch being 1, the next one 2, and so on. The sequence number is assigned, in order, to 
each delta on a particular branch. Thus, 1.3.1.2 identifies the second delta of the first branch that 
derives from delta 1.3. This is shown in Figure 2. 


1:38.12 
Branch 1 


11 1.2 1.3 1.4 2.1 2.2 


Figure 2. Tree Structure with Branch Deltas 


The concept of branching may be extended to any delta in the tree; the naming of the resulting deltas 
proceeds in the manner just illustrated. 


Two observations are of importance with regard to naming deltas. First, the names of trunk deltas con- 
tain exactly two components, and the names of branch deltas contain exactly four components. 
Second, the first two components of the name of branch deltas are always those of the ancestral trunk 
delta, and the branch component is assigned in the order of creation of the branch, independently of its 
location relative to the trunk delta. Thus, a branch delta may always be identified as such from its 
name. Although the ancestral trunk delta may be identified from the branch deita’s narne, it is nor pos- 
sible to determine the entire path leading from the trunk delta to the branch delta. For example, if 
delta 1.3 has one branch emanating from it, all deltas on that branch will be named 1.3.1... If a delta 
on this branch then has another branch emanating from it, all deltas on the new branch will be named 
1.3.2.1 (see Figure 3). The only information that may be derived from the name of deita 1.3.2.2 is 
that it is the chronologically second delta on the chronologically second branch whose trunk ancestor is 
deita 1.3. In particular, it is mor possible to determine rom the name of delta 1.3.2.2 all of the deltas 
between it and its trunk ancestor (1.3). 


1.3.1.2 
Branch 2 


1.1 1.2 1.3 1.4 2.1 2.2 


' Figure 3. Extending the Branching Concept 


[t is obvious that the concept of branch deltas allows the generation of arbitrarily complex tree struc- 
tures. Although this capability has been provided for certain specialized uses, it is strongly recom- 
mended that the sccs tree be kept as simple as possible, because comprehension of its structure 
becomes extremely difficult as the tree becomes more complex. 


4. SCCS COMMAND CONVENTIONS 


This section discusses the conventions and rules that apply to sccS commands. These rules and con- 
ventions are generaily applicable to a// sccs commands, except as indicated below. Sccs commands 
accept two types of arguments: key/etter arguments and file arguments. 


Ps 


Keyletter arguments (hereafter cailed simply ‘‘keyletters’’) begin with a minus sign (—), followed by a 
lower-case alphabetic character, and, in some cases, followed by a value. These keyletters control the 
execution of the command to which they are supplied. 


File arguments (which may be names of files and/or directories) specify the file(s) that the given sccs 
command is to process; naming a directory is equivalent to naming ail the sccs files within the direc- 
tory. Non-sccs files and unreadable? files in the named directories are silently ignored. 


[In general, file arguments may zor begin with a minus sign. However, if the name ‘‘—’’ (a lone minus 
sign) is specified as an argument to a command, the command reads the standard input for lines and 
takes each line as the name of an sccs file to be processed. The standard input is read until end-of-file. 
This feature is often used in pipelines [4] with, for example, the find(1) or is(I) commands. Again, 
names of non-sccs files and of unreadable files are silently ignored. 


All keyletters specified for a given command apply to ai/ file arguments of that command. All 
keyletters are processed before any file arguments, with the result that the placement of keyletters is 
arbitrary (i.e., keyletters may be interspersed with file arguments). File arguments, however, are pro- 
cessed left to right. 


Somewhat different argument conventions apply to the Aelp, whar, and sccsdiff commands (see Sections | 
5.5, 5.8, and 5.9). 


Certain actions of various SCCS commands are controlled by flags appearing in Sccs files. Some of these 
flags are discussed below. For a complete description of all such flags, see admin(I). 


The distinction between the rea/ user and the effective user of a PWB/UNIX System is of concern in dis- 
cussing various actions of sccS commands. For the present, it is assumed that both the real user and 
the effective user are one and the same (i.e., the user who is logged into a PWB/UNIX system); this sub- 
ject is further discussed in Section 6.1. 


All sccs commands that modify an sccs file do so by writing a temporary copy, called the x-fi/e, which. 
ensures that the sccs file will not be damaged should processing terminate abnormally. The name of 
the x-file is formed by replacing the ‘‘s.”’ of the sccs file name with ‘‘x.’’. When processing is com- 
plete, the old sccs file is removed and the x-fi/e is renamed to be the sccs file. The x-/i/e is created in 
the directory containing the sccs file, is given the same mode (see chmod{(I)) as the sccs file, and is 
owned by the effective user. 


To prevent simultaneous updates to an sccs file, commands that modify sccs files create a /ock-file, 
called the z-fi/le, whose name is formed by replacing the ‘‘s.”’ of the sccs file name with ‘‘z.’’. The z- 
file contains the process number {1] of the command that creates it, and its existence is an indication to 
other commands that that sccs file is being updated. Thus, other commands that modify sccs files will 
not process an sccs file if the corresponding z-file exists. The z-file is created with mode 444 (read- 
only) in the directory containing the sccs file, and is owned by the effective user. This file exists only 
for the duration of the execution of the command that creates it. In general, users can ignore x-files 
and z-fi/es; they may be useful in the event of system crashes or similar situations. 


Sccs commands produce diagnostics (on the diagnostic output [5]) of the form: 
ERROR [name-of-file-being-processed]: message text (code) 


The code in parentheses may be used as an argument to the Ae/p command (see Section 5.5) to obtain a 
further explanation of the diagnostic message. : 


Detection of a fatal error during the processing of a file causes the SCCS command to terminate process- 
ing of that file and to proceed with the next file, in order, if more than one file has been named. 
5. SCCS COMMANDS 


This section describes the major features of all the sccs commands. Detailed descriptions of the com- 
mands and of all their arguments are given in [4], and should be consulted for further information. 
The discussion below covers only the more common arguments of the various SCCS commands. 


2. Because of permission modes (see chined({I)). 
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Because the commands ger and de/ta are the most frequently used, they are presented first. The other 
commands follow in approximate order of importance. 


The following is a summary of all the sccS commands and of their major functions: 
get Retrieves versions of sccs files. 
delta Applies changes (deltas) to the text of sccs files, i.e., creates new versions. 


admin Creates sccs files and applies changes to parameters of SCcs files. 


prt Formats and prints portions of SCcs files. 

help Gives explanations of diagnostic messages. 

rmdel Removes a delta from an sccs file; allows the removal of deltas that were created by mis- 
take. 


chghist Changes the commentary associated with a delta. 


what Searches any PWB/UNIX file(s) for all occurrences of a special pattern and prints out what 
follows it; is useful in finding identifying information inserted by the get command. 


sccsdiff | Shows the differences between any two versions of an Sccs file. 


comb Combines two or more consecutive deltas of an sccs file into a single delta; often 
reduces the size of the sccs file. 


5.1 get 


The get command creates a text file that contains a particular version of an sccs file. The particular 
version is retrieved by beginning with the initial version, and then applying deltas, in order, until the 
desired version is obtained. The created file is called the g-file; its name is formed by removing the 
‘*s.”’ from the sccs file name. The g-fle is created in the current directory [1] and is owned by the real 
user. The mode assigned to the g-fi/e depends on how the ger command is invoked, as discussed below. 


The most common invocation of get is: 
get s.abc 


which normally retrieves the latest version on the trunk of the sccs file tree, and produces (for exam- 
ple) on the standard output [5]: 


1.3 
67 lines 
No id keywords (cm7) 


which indicates that: 


1. Version 1.3 of file ‘‘s.abc’’ was retrieved (1.3 is the latest trunk delta). 
2. This version has 67 lines of text. 
3. No ID keywords were substituted in the file (see Section 5.1.1 for a discussion of ID keywords). 


The generated g-file (file ‘‘abc’’) is given mode 444 (read-only), since this particular way of invoking 
get is intended to produce g-/iles only for inspection, compilation, etc., and nor for editing (i.e., nor for 
making deltas). 


In the case of several file arguments (or directory-name arguments), similar information is given for 
each file processed, but the sccs file name precedes it. For example: 


get s.abc s.def 


produces: 


67 lines 


No id keywords (cm7) 
3.1.1 ID Keywords 


In generating a g-file to be used for compilation, it is useful and informative to record the date and time 
of creation, the version retrieved, the module’s name, etc., within the g-file, so as to have this informa- 
tion appear in a load module when one is eventually created. SCCS provides a convenient mechanism 
for doing this automatically. /dentification (1D) keywords appearing anywhere in the generated file are 
replaced by appropriate values according to the definitions of these 1D keywords. The format of an 1D 
keyword is an upper-case letter enclosed by percent signs (%). For example: 


%I% 


is defined as the 1D keyword that is replaced by the sID of the retrieved version of a file. Similarly, 
%H% is defined as the ID keyword for the current date (in the form ‘‘mm/dd/yy’’), and %M% is 
defined as the name of the g-file. Thus, executing ger on an Sccs file that contains the PL/I declaration: 


DCL ID CHAR(100) VAR INITV('%M% %I% %H%’); 
gives (for example) the following: 
DCL ID CHAR(100) VAR INIT(MODNAME 2.3 07/07/77'); 
When no ID keywords are substituted by ger, the following message is issued: 
No id keywords (cm7) 


This message is normally treated as a warning by ger, although the presence of the i flag in the sccs file 
Causes it to be treated as an error (see Section 5.2 for further information). 


For a complete list of the approximately twenty iD keywords provided, see ger({1). 
3.1.2 Retrieval of Different Versions 


Various keyletters are provided to allow the retrieval of other than the default version of an sccs file. 
Normally, the default version is the most recent delta of the highest-numbered release on the trunk of 
the sccs file tree. However, if the sccs file being processed has a d (default sip) flag, the sID specified 
as the value of this flag is used as a default. The default sip is interpreted in exactly the same way as 
the value supplied with the —r keyletter of ger. 


The —r keyletter is used to specify an SID to be retrieved, in which case the d (default sip) flag (if any) 
is ignored. For example: 


get —rl.3 s.abe 
retrieves version 1.3 of file ‘‘s.abc’’, and produces (for example) on the standard output: 


1.3 
64 lines 


A branch delta may be retrieved similarly: 
get —rl.5.2.3 s.abe 
which produces (for example) on the standard output: 


1.5.2.3 
234 lines 


When a two- or four-component SID is specified as a value for the —r keyletter (as above) and the 
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particular version does not exist in the sccs file, an error message results. Omission of the level 
number, as in: 


get —r3 s.abc 
causes retrieval of the trunk delta with the highest level! number within the given release, if the given 
release exists. Thus, the above command might output: 

3.7 

213 lines 


If the given release does not exist, get retrieves the trunk delta with the highest level number within the 
highest-numbered existing release that is lower than the given release. For example, assuming release 
9 does not exist in file ‘“‘s.abc’’, and that release 7 is actually the highest-numbered release below 9, 
execution of: 


get —r9 s.abc 
might produce: 


7.6 
420 lines 


which indicates that trunk delta 7.6 is the latest version of file ‘‘s.abc’’ below release 9. Similarly, 
omission of the sequence number, as in: 
get —r4.3.2 s.abc 


results in the retrieval of the branch delta with the highest sequence number on the given branch, if it 
exists. (If the given branch does not exist, an error message results.) This might result in the follow- 
ing output: 

4.3.2.8 

89 lines 


The —t keyletter is used to retrieve the latest (‘‘top’’) version in a particular release (i.e., when no —r 
keyletter is supplied, or when its value is simply a release number). The latest version is defined as 
that delta which was produced most recently, independent of its location on the sccs file tree. Thus, if 
the most recent delta in release 3 is 3.5, 


get —r3 —t s.abc 
might produce: 


3.5 
59 lines 


However, if branch delta 3.2.1.5 were the latest delta (created after delta 3.5), the same command 
might produce: 


Be2sis5 
46 lines 


5.1.3 Retrieval with Intent to Make a Delta 


Specification of the —e keyletter to the ger command is an indication of the intent to make a delta, and, 
as such, tts use is restricted. The presence of this keyletter causes gef to: 


1. Check the user /ist (which is the list of /ogin names of users allowed to make deltas (see Section 
6.2)) to determine if the login name of the user executing ger is on that list. Note that a ull 
(empty) user list behaves as if it contained a// possible login names. 

2. Check that the release (R) of the version being retrieved satisfies the relation: 


floor < R < ceiling 


to determine if the release being accessed is a protected release. The floor and ceiling are specified 
as flags in the sccs file. 
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A failure of either condition causes the processing of that sccs file to terminate. 


If the above checks succeed, the —e keyletter causes the creation of a g-file in the current directory 
with mode 644 (readable by everyone, writable only by the owner) owned by the real user. If a writable 
g-file already exists, ger terminates with an error. This is to prevent inadvertent destruction of a g-flle 
that already exists and is being edited for the purpose of making a delta. 


Any ID keywords appearing in the g-/file are not substituted by ger when the —e keyletter is specified, 
because the generated g-file is to be subsequently used to create another delta, and replacement of ID 
keywords would cause them to be permanently changed within the sccs file. In view of this, ger does 
not need to check for the presence of ID keywords within the g-/i/e, so that the message: 


No id keywords (cm7) 
is never output when ger is invoked with the —e keyletter. 


In addition, the —e keyletter causes the creation (or updating) of a p-/ile, which is used to pass infor- 
mation to the de/ta command (see Section 5.1.4). 


The following is an example of the use of the —e keyletter: 
get —e s.abc 
which produces (for example) on the standard output: 


1.3 
67 lines 


If the —r and/or —t keyletters are used together with the —e keyletter, the version retrieved for edit- 
ing is as specified by the —r and/or —t keyletters. 


The keyletters ~i and —x may be used to specify a list (see ger(I) for the syntax of such a list) of del- 
tas to be included and excluded, respectively, by ger. Including a delta means forcing the changes that 
constitute the particular delta to be included in the retrieved version. This is useful if one wants to 
apply the same changes to more than one version of the sccs file. Excluding a delta means forcing it to 
be not applied. This may be used to undo, in the version of the sccs file to be created, the effects of a 
previous delta. Whenever deltas are included or excluded, ger checks for possible interference between 
such deltas and those deltas that are normally used in retrieving the particular version of the sccs file. 
(Two deltas can interfere, for example, when each one changes the same line of the retrieved g-file.) 
Any interference is indicated by a warning that shows the range of lines within the retrieved g-fi/e in 
which the problem may exist. The user is expected to examine the g-file to determine whether a prob- 
lem actually exists, and to take whatever corrective measures (if any) are deemed necessary (e.g., edit 
the file). 


mp = 6The —i and —x keyletters should be used with extreme care. 


The —k keyletter is provided to facilitate regeneration of a g-file that may have been accidentally 
removed or ruined subsequent to the execution of ger with the —e keyletter, or to simply generate a g- 
file in which the replacement of ID keywords has been suppressed. Thus, a g-ft/le generated by the —k 
Keyletter is identical to one produced by ger executed with the —e keyletter. However, no processing 
related to the p-/file takes place. 


5.1.4 The p-file and Concurrent Deltas 


The ability to retrieve different versions of an SCCs file allows a number of deltas to be ‘‘in progress’ at 
any given time. This means that a number of ger commands with the —e keyletter may be executed on 
the same file, provided that no two executions retrieve the same version nor lead to the subsequent 
creation of the same version by delta. 


The p-file (which is created by the ger command invoked with the —e keyletter) is named by replacing 
the ‘‘s.’’ in the sccs file name with ‘‘p.’’. It is created in the directory containing the sccs file, is given 
mode 644 (readable by everyone, writable only by the owner), and is owned by the effective user. The 
p-file contains the following information for each delta that is still ‘‘in progress’’:? 


3. Other information may be present, but is not of concern here. See ger{I) for further discussion. 


oe 


e The SID of the retrieved version. 
e The SID that will be given to the new delta when it is created. 
e The login name of the real user executing get. 


The first execution of ‘“‘get —e’’ causes the creation of the p-file for the corresponding sccs file. Subse- 
quent executions only update the p-file by inserting a line containing the above information. Before 
inserting this line, however, ger checks that: 


e No entry already in the p-file specifies as already retrieved the sID of the version to be retrieved. 
e That the new (‘‘to-be-created”’) siD is not already specified as such in the p-/ile. 


If both checks succeed, the user is informed that other deltas are in progress, and processing continues. 
If either check fails, an error message results. It is important to note that the various executions of ger 
should be carried out from different directories. Otherwise, only the first execution will succeed, since 
subsequent executions would attempt to over-write a writable g-file, which is an SCCS error condition. In 
practice, such multiple executions are performed by different users,* so that this problem does not 
arise, Since each user normally has a different working directory [5]. 


Table | shows, for the most useful cases, what version of an sccs file is retrieved by ger, as well as the 
SID of the version to be eventually created by de/ta, as a function of the SID specified to get. 


5.1.5 Keyletters That Affect Output 


Specification of the —p keyletter causes ger to write the retrieved text to the standard output, rather 
than to a g-file. In addition, all output normally directed to the standard output (such as the sID of the 
version retrieved and the number of lines retrieved) is directed instead to the diagnostic output. This 
may be used, for example, to create g-files with arbitrary names: 


get —p s.abc > arbitrary-filename 


The —p keyletter is particularly useful when used with the ‘“‘!”’ or “S$” arguments of the PWB/UNIX 
send(I) command. For example: 


send MOD=s.abc REL=3 compile 
if file ‘‘compile’” contains: 


//plicomp job job-card-information 
//stepl exec plicke 

//pli.sysin dd * 

“= 

“Iget —p ~—rREL MOD 

/* 

// 


will send the highest level of release 3 of file ‘‘s.abc’’. Note that the line ‘‘~ —s’’, which causes send(]) 
to make 1D keyword substitutions before detecting and interpreting control lines, is necessary if send(I) 
is to substitute “‘s.abc’’ for MOD and ‘‘3”’ for REL in the line “““!get ~p ~rREL MOD”. 


The —s keyletter suppresses all output that is normaily directed to the standard output. Thus, the sip 
of the retrieved version, the number of lines retrieved, etc., are not output. This does not, however, 
affect messages to the diagnostic output. This keyletter is used to prevent non-diagnostic messages 
from appearing on the user’s terminal, and is often used in conjunction with the —p keyletter to 
‘“‘pipe’’ the output of ger, as in: 


get —p —s s.abce | nroff 


The -—g keyletter is supplied to suppress the actual retrieval of the text of a version of the sccs file. 
This may be useful in a number of ways. For example, to verify the existence of a particular sID in an 
SCCS file, one may execute: 


get ~—g —r4.3 s.abe 


4. See Section 6.1 for a discussion of how different users are permitted to use SCCS commands on the same files. 


AZ 


TABLE 1. Determination of New SID 


Ces SID —b Keyletter Other SID sid of Delta 
Specified* Usedt Conditions Retrieved to be Created 
l. nonet no R defaults to mR mR.mL mR.(mL + 1) 
2. nonet yes R defaults to mR mR.mL mR.mL.(mB + 1).1 
3. R no R > mR mR.mL R.1 
4. R no R= mR mR.mL mR.(mL + 1) 
3: R yes R>mR mR.mL mR.mL.(mB+ 1).1 
6. R yes R= mR mR.mL mR.mL.(mB+ 1).1 
7 oR = Speeamaams hR.mL**  —-AR.mL.(mB+ 1).1 
Trunk successor 
8. R _ in release > R R.mL R.mL.(mB+ 1).1 
and R exists 
9. R.L no No trunk successor R.L R.(L + 1) 
10. R.L yes No trunk successor R.L R.L.(mB+ 1).1 
Trunk successor 
ll. R.L —_ ja pateace: SR R.L R.L.(mB+ 1).1 
12. R.L.B no No branch successor R.L.B.mS R.L.B.(mS + 1) 
13. R.L.B yes No branch successor R.L.B.mS R.L.(mB+ 1).1 
14. R.L.B.S no No branch successor R.L.B.S R.L.B.(S + 1) 
15. R.L.B.S yes No branch successor R.L.B.S R.L.(mB+ 1).1 
16. R.L.B.S —_ Branch successor R.L.B.S R.L.(mB+ 1).1 


* RL, “SB, and “*S** are the ‘‘release’’, “‘level’’, ‘““branch’’, and ‘‘sequence’’ components of the SID, respectively: 
‘‘m’’ means ‘“‘maximum'’’. Thus, for example, “R.mL’’ means “the maximum level number within release R”’; 
“R.L.(mB+ 1).1°° means ‘the first sequence number on the new branch (i.e., maximum branch number plus |) of level L 
within release R’*. Note that if the SID specified is of the form “‘R.L’’, ‘*R.L.B’’, or ““R.L.B.S’’, each of the specified 
components sts? exist. 

+ The —b keyletter is effective only if the b flag (see admin(I)) is present in the file. In this table, an entry of ‘‘~’’ means 
‘“trrelevant’”. 

+ This case applies if the d (default SID) flag is nor present in the file. If the d flag ss present in the file, then the SID 
obtained from the d flag is interpreted as if it had been specified on the command line. Thus, one of the other cases in this 
table applies. 


3} This case is used to force the creation of the first deita in a new release. 
** -hR*' is the highest exismug release that is lower than the specified, nonexistent, release R. 


This outputs the given SID if it exists in the sccs file, or it generates an error message, if it does not. 
Another use of the —g keyletter is in regenerating a p-fi/e that may have been accidentally destroyed: 


get —e —g s.abc 


The —1 keyletter causes the creation of an /-file, which is named by replacing the ‘‘s.’’ of the sccs file 
name with ‘‘l.”’. This file is created in the current directory, with mode 444 (read-only), and is owned 
by the real user. It contains a table (whose format is described in ger(I)) showing which deltas were 
used in constructing a particular version of the sccs file. For example: 


get —r2.3 —!1 s.abe 


generates an /-file showing which deltas were applied to retrieve version 2.3 of the sccs file. Specifying 
a value of ‘‘p’” with the —1| keyletter, as in: 


get —Ip —r2.3 s.abe 


causes the generated output to be written to the standard output rather than to the /-file. Note that the 
—g keyletter may be used with the —1 keyletter to suppress the actual retrieval of the text. 
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The —m keyletter is of use in identifying, line by line, the changes applied to an sccs file. 
Specification of this keyletter causes each line of the generated g-/file to be preceded by the sip of the 
delta that caused that line to be inserted. The SID is separated from the text of the line by a tab charac- 
ter. 


The —n keyletter causes each line of the generated g-/file to be preceded by the value of the %M% In 
keyword (see Section 5.1.1) and a tab character. The —n keyletter is most often used in a pipeline with 
grep(1). For example, to find, in the latest version of each sccs file in a directory, ail lines that match a 
given pattern, the following may be executed: 


get —p —n —s directory | grep pattern 


If both the —m and —n keyletters are specified, each line of the generated g-file is preceded by the 
value of the %M% 1D keyword and a tab (this is the effect of the —n keyletter), followed by the line in 
the format produced by the —m keyletter. Because use of the ~m keyletter and/or the —n keyletter 
causes the contents of the g-file to be modified, such a g-file must nor be used for creating a delta. 
Therefore, neither the —m keyletter nor the —n keyletter may be specified together with the —e 
keyletter. 


See ger(I) for a full description of additional ger keyletters. 
5.2 delta 


The de/ta command is used to incorporate the changes made to a g-ft/e into the corresponding sccs file, 
i.e., to create a delta, and, therefore, a new version of the file. 


Invocation of the de/ta command requires the existence of a p-file (see Sections 5.1.3 and 5.1.4). Delta 
examines the p-file to verify the presence of an entry containing the user’s login name. If none is 
found, an error message results. Delta also performs the same permission checks that ger performs 
when invoked with the —e keyletter. If all checks are successful, de/ta determines what has been 
changed in the g-file, by comparing it (via dif(I)) with its own, temporary copy of the g-fi/e as it was 
before editing. This temporary copy of the g-file is called the d-file (its name is formed by replacing the 
‘*s.”’ of the sccs file name with ‘“‘d.”’) and is obtained by performing an internal ger at the SID specified 
in the p-flle entry. 


The required p-fi/e entry is the one containing the login name of the user executing de/ta, because the 
user who retrieved the g-/ile must be the one who will create the delta. However, if the login name of 
the user appears in more than one entry (i.e., the same user executed ger with the —e keyletter more 
than once on the same sccs file), the —r keyletter must be used with de/ra to specify the SID that is to 
be used by the internal ger to obtain the d-file. The SID specified must, of course, appear in one of the 
entries in the p-file; this entry is the one used to obtain the sID of the delia to be created. 


In practice, the most common invocation of de/ta is: 
delta s.abc 

which prompts on the standard output (but only if it is a terminal): 
comments? 


to which the user replies with a description of why the delta is being made, terminating the reply with a 
newline character. The user’s response may be up to 512 characters long, with newlines nor intended to 
terminate the response escaped by ‘*\’’. 


If the sccs file has a v flag, defta first prompts with: 
MRs? 


on the standard output. (Again, this prompt is printed only if the standard output is a terminal.) The 
standard input is then read for MR° numbers, separated by blanks and/or tabs, terminated in the same 
manner as the response to the prompt ‘‘comments?’’. 


3. In a tightly controlled environment, it is expected that deltas are created only as a result of some trouble report, change 
request, trouble ticket, etc. (collectively called here Modification Requests, or MRs) and that it is desirable or necessary to 
record such MR number(s) within each delta. 


2'V$2 


The —y and/or —m kKeyletters are used to supply the commentary (comments and MR numbers, 
respectively) on the command line, rather than through the standard input. For example: 


delta —y"descriptive comment” —m"mrnuml mrnum2” s.abc 


In this case, the corresponding prompts are not printed, and the standard input is not read. The —m 
keyletter is allowed only if the sccs file has a v flag. These keyletters are useful when delra is executed 
from within a Shell procedure (see sh(1)). 


The commentary (comments and/or MR numbers), whether solicited by def/ta or supplied via 
keyletters, 1s recorded as part of the entry for the delta being created, and applies to a//sccs files pro- 
cessed by the same invocation of delta. This implies that if de/ra is invoked with more than one file 
argument, and the first file named has a v flag, all files named must have this flag. Similarly, if the first 
file named does not have this flag, then none of the files named may have it. Any file that does not 
conform to these rules is not processed. 


When processing is complete, de/fa outputs (on the standard output) the sID of the created delta 
(obtained from the p-file entry) and the counts of lines inserted, deleted, and left unchanged by the 
delta. Thus, a typical output might be: 


1.4 

14 inserted 

7 deleted 

345 unchanged 


It is possible that the counts of lines reported as inserted, deleted, or unchanged by de/ta do not agree 
with the user’s perception of the changes applied to the g-file. The reason for this is that there usually 
are a number of ways to describe a set of such changes, especially if lines are moved around in the g- 
file, and delta is likely to find a description that differs from the user’s perception. However, the tota/ 
number of lines of the new delta (the number inserted plus the number left unchanged) should agree 
with the number of lines in the edited g-/le. 


If, in the process of making a delta, de/ta finds no ID keywords in the edited g-/ile, the message: 
No id keywords (cm7) 


is issued after the prompts for commentary, but before any other output. This indicates that any ID 
keywords that may have existed in the sccs file have been replaced by their values, or deleted during 
the editing process. This could be caused by creating a delta from a g-file that was created by a ger 
without the —e keyletter (recall that 1D keywords are replaced by ger in that case), or by accidentally 
deleting or changing the ID keywords during the editing of the g-file. Another possibility is that the file 
may never have had any ID keywords. In any case, it is left up to the user to determine what remedial 
action is necessary, but the deita is made, unless there is an i flag in the sccs file, indicating that this 
should be treated as a fatal error. In this last case, the delta is not created. 


After processing of an sccs file is complete, the corresponding p-fi/e entry is removed from the p-jfile.® 
If there is only ome entry in the p-file, then the p-file itself is removed. 


In addition, de/ta removes the edited g-/ile, unless the —n keyletter is specified. Thus: 
delta —n s.abc 
will keep the g-ffle upon completion of processing. 


The —s (‘‘silent’’) keyletter suppresses all output that is normally directed to the standard output, 
other than the prompts ‘‘comments?’’ and ‘‘MRs?’’. Thus, use of the —s keyletter together with the 
—y keyletter (and possibly, the —m keyletter) causes de/ta neither to read the standard input nor to 
write the standard output. 


6. All updates to the p-file are made to a temporary copy, the qg-/Ale, whose use is similar to the use of the x-/ile, which is 
described in Section 4 above. 
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The differences between the g-file and the d-file (see above), which constitute the delta, may be printed 
on the standard output by using the —p keyletter. The format of this output is similar to that produced 


by diff (I). 
5.3 admin 


The admin command is used to adminster sccs files, that is, to create new sccs files and to change 
parameters of existing ones. When an sccs file is created, its parameters are initialized by use of 
keyletters or are assigned default values if no keyletters are supplied. The same keyletters are used to 
change the parameters of existing files. 


Two keyletters are supplied for use in conjunction with detecting and correcting “‘corrupted’”’ sccs files, 
and are discussed in Section 6.3 below. 


Newly-created sccs files are given mode 444 (read-only) and are owned by the effective user. 


Only a user with write permission in the directory containing the Sccs file may use the admin command 
upon that file. 


5.3.1 Creation of SCCS Files 
An sccs file may be created by executing the command: 
admin —ifirst s.abe 


in which the value (‘‘first’’) of the —i keyletter specifies the name of a file from which the text of the 
mutial delta of the sccs file ‘‘s.abc’’ is to be taken. Omission of the value of the —i keyletter indicates 
that admin is to read the standard input for the text of the initial delta. Thus, the command: 


admin —i s.abc <_ first 


is equivalent to the previous example. If the text of the initial delta does not contain ID keywords, the 
message: 


No id keywords (cm7) 


is issued by admin as a warning. However, if the same invocation of the command also sets the i flag 
(not to be confused with the —i keyletter), the message is treated as an error and the sccs file is not 
created. Only onesccs file may be created at a time using the —i keyletter. 


When an sccs file is created, the re/ease number assigned to its first delta is normally ‘‘1’’, and its level 
number is always ‘‘1’’. Thus, the first delta of an sccs file is normally ‘1.1’. The —r keyletter is 
used to specify the release number to be assigned to the first delta. Thus: 


admin —ifirst —r3 s.abc 


indicates that the first delta should be named ‘‘3.1’’ rather than ‘*!.1°°. Because this keyletter is only 
meaningful in creating the first delta, its use is only permitted with the —i keyletter. 


5.3.2 Inialization and Modification of SCCS File Parameters 


The portion of the sccs file reserved for descriptive text (see Section 6.2) may be initialized or changed 
through the use of the —t keyletter. The descriptive text is intended as a summary of the contents and 
purpose of the sccs file, although its contents may be arbitrary, and it may be arbitrarily long. 


When an SCcs file is being created and the —t keyletter is supplied, it must be followed by the name of 
a file from which the descriptive text is to be taken. For example, the command: 


admin —ifirst —tdesc s.abc 
specifies that the descriptive text is to be taken from file ‘‘desc’’. 


When processing an existing sccs file, the —t keyletter specifies that the descriptive text (if any) 
currently in the file is to be replaced with the text in the named file. Thus: 


admin --tdesc s.abc 


oe 


specifies that the descriptive text of the Sccs file is to be replaced by the contents of ‘‘desc’’: omission 
of the file name after the —t keyletter as in: 


admin —t s.abc 
causes the removal of the descriptive text from the sccs file. 


The flags (see Section 6.2) of an sccs file may be initialized, changed, or deleted through the use of the 
—f and ~—d keyletters, respectively. The flags of an sccs file are used to direct certain actions of the 
various commands. See admin(I) for a description of all the flags. For example, the v flag specifies 
that de/ia is to prompt for Modification Request (MR) numbers, and the d (default sip) flag specifies 
the default version of the sccs file to be retrieved by the get command. The —f keyletter is used to set 
a flag and, possibly, to set its value. For example: 


admin —ifirst —fvy —fmmodname s.abc 


sets the v flag and the m (module name) flag. The value ‘‘modname’”’ specified for the m flag is the 
value that the ger command will use to replace the %M% Ip keyword. (In the absence of the m flag, 
the name of the g-file is used as the replacement for the %M% ID keyword.) Note that several —f 
keyletters may be supplied on a single invocation of admin, and that —f keyletters may be supplied 
whether the command is creating a new SCCS file or processing an existing one. 


The —d keyletter is used to delete a flag from an sccs file, and may only be specified when processing 
an existing file. As an example, the command: 


admin —dm s.abc 


removes the m flag from the sccs file. Several —d keyletters may be supplied on a single invocation of 
admin, and may be intermixed with —f keyletters. 


Sccs files contain a list (user list) of login names of users who are allowed to create deltas (see Sections 
5.1.3 and 6.2). This list is empty by default, which implies that anyone may create deltas. To add login 
names to the list, the ~a keyletter is used. For example: 


admin —axyz —awal s.abe 


adds the login names ‘‘xyz’’ and ‘‘waql’’ to the list. The —a keyletter may be used whether admin is 
creating a new Sccs file or processing an existing one, and may appear several times. The —e keyletter 
is used in an analogous manner if one wishes to remove (‘‘erase’’) login names from the list. 


5.4 prt 


Prt is used to format and print on the standard output all or parts of an sccs file (see Section 6.2), pre- 
ceded by the file’s name. The portions of the file to be printed are selected by specifying certain 
keyletters, which, together with the output formats they generate, are fully described in prr{1). This 
section only describes briefly the —d, —u, —f, and —t keyletters, which are sufficient to print ail of the 
more interesting portions of an sccs file. 


The —d keyletter is used to print the delta table of an Sccs file. The delta table is that portion of the 
file that contains information relevant to the creation of each delta of the file, namely the sip of the 
delta, the date and time of creation, the /ogin name of the creator, and the numbers of lines inserted, 
deleted, and unchanged by the delta. The commentary that is entered when a delta ts created is also 
part of the delta table. Thus, executing the command: 


prt —d s.abc 


provides a history of the evolution of the sccs file. In the absence of any keyletters, the —d keyletter 
is assumed. ; 


The —u keyletter is used to print the user list. The —f keyletter causes the printing of all the flags of 
the sccs file. The —t keyletter is used to print the descriptive text of the sccs file (see Section 6.2); this 
could be used, for example, to generate a complete set of file summaries, by executing: 


prt —t sccs 


in which ‘“‘sccs’’ is the name of a directory containing the sccs files. 
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Although prt makes the examination of sccs files convenient, other PWB/UNIX commands (e.g., ed(I), 
grep(1)) can be used to create customized print commands in the form of Shell procedures. 


5.5 help 


The Aelp command prints explanations of sccs commands and of messages that these commands may 
print. Arguments to Aelp, zero or more of which may be supplied, are simply the names of sccs com- 
mands or the code numbers that appear in parentheses after SCCS messages. If no argument is given, 
help prompts for one. Help has no concept of key/etrer arguments or file arguments. Explanatory infor- 
mation related to an argument, if it exists, is printed on the standard output. If no information is 
found, an error message is printed. Note that each argument is processed independently, and an error 
resulting from one argument will nor terminate the processing of the other arguments. 


Explanatory information related to a command is a synopsis of the command. For example: 
help ge5 rmdel 
produces: 


ge5: 

"nonexistent sid” 

The specified sid does not exist in the 
given file. 

Check for typos. 


rmdel: 
rmdel —rSID name ... 


5.6 rmdel 


The rmde! command is provided to allow removal of a delta from an Sccs file, though its use should be 
reserved for those cases in which incorrect, global changes were made a part of the delta to be 
removed. 


The delta to be removed must be a “‘leaf’’ delta. That is, it must be the latest (most recently created) 
delta on its branch or on the trunk of the sccs file tree. In Figure 3, only deltas 1.3.1.2, 1.3.2.2, and 
2.2 can be removed; once they are removed, then deltas 1.3.2.1 and 2.1 can be removed, and so on. 


To be allowed to remove a delta, the effective user must have write permission in the directory contain- 
ing the sccs file. In addition, the real user must either be the one who created the delta being 
removed, or be the owner of the sccs file and its directory. 


The —r keyletter, which is mandatory, is used to specify the complete SID of the delta to be removed 
(i.e., it must have two components for a trunk delta, and four components for a branch delta). Thus: 


rmdel —r2.3 s.abce 


specifies the removal of (trunk) delta ‘‘2.3”° of the sccs file. Before removal of the delta, rmde/ checks 
that the release number (R) of the given SID satisfies the relation: 


floor < R < ceiling 


In addition, the login name of the user must appear in the file’s user /ist, or the user list must be empty. 
If these conditions are not satisfied, processing is terminated, and the delta is not removed. After the 
specified delta has been removed, its type indicator in the delta table of the sccs file (see Section 6.2) is 
changed from ‘‘D’’ (for ‘‘delta’’) to ‘“*R’’ (for ‘‘removed’’). 


5.7 chghist 


The chghist command is used to change a delta’s commentary that was supplied when that delta was 
created. Its invocation is analogous to that of the rmde/ command, except that the delta to be processed 
iS not required to be a leaf delta. For example: 


chghist —r3.4 s.abc 
specifies that the commentary of delta ‘‘3.4’’ of the sccs file is to be changed. 


ae 


The new commentary is solicited by cAghist in the manner of the de/ta command. The old commentary 
associated with the specified delta is kept, but it is preceded by a comment line indicating that it has 
been changed (i.e., superseded), and the new commentary is entered ahead of this comment line. The 
‘“inserted’’ comment line records the login name of the user executing chghist and the time of its exe- 
cution. 


5.8 what 


The whar command is used to find identifying information within any pwB/UNIX file whose name is 
given as an argument to whar. Directory names and a name of ‘‘—”’ (a lone minus sign) are zor 
treated specially, as they are by other SCCS commands, and no keyletters are accepted by the command. 


What searches the given file(s) for all occurrences of the string ‘‘@(#)’’, which is the replacement for 
the %Z% 1D keyword (see ger{I)), and prints (on the standard output) what follows that string until the 
first double quote ("), greater than (>), newline, or (non-printing) NUL character. Thus, for exam- 
ple, if the sccs file ‘‘s.prog.c’’ (which is a C program), contains the following line (the %M% and %I% 
ID keywords were defined in Section 5.1.1):.- 


char id{] "%Z%%M%:%1%";- 


and then the command: 


¢ 


get —r3.4 S.prog.c 


is executed, and finally the resulting g-file is compiled to produce ‘‘prog.o”’ and ‘‘a.out’’, then the com- 
mand: 


what prog.c prog.o a.out 
produces: | 


prog.c: 
prog.c:3.4 

prog.o: 
prog.c:3.4 

a.out: 
prog.c:3.4 


The string searched for by whar need not be inserted via an ID keyword of ger; it may be inserted in any 
convenient manner. 


5.9 sccsdiff 


The sccsdiff command determines (and prints on the standard output) the differences between two 
specified versions of one or more Sccs files. The versions to be compared are specified by using the —r 
keyletter, whose format is the same as for the ger command. The two versions must be specified as the 
first two arguments to this command in the order in which they were created, i.e., the older version is 
specified first. Any following keyletters are interpreted as arguments to the pr{I) command (which 
actually prints the differences) and must appear before any file names. Sccs files to be processed are 
named last. Directory names and a name of ‘‘—’’ (a lone minus sign) are not acceptable to sccsdiff. 


The differences are printed in the form generated by diff{I). The following is an example of the invo- 
cation of sccsdiff? 


scesdiff —r3.4 —r5.6 s.abc 
5.10 comb 


Comb generates a. Shell procedure (see sh{1)) which attempts to reconstruct the named sccs files so that 
the reconstructed files are smaller than the originals. The generated Sheil procedure is written on the 
Standard output. 


Named sccs files are reconstructed by discarding unwanted deltas and combining specified other deltas. 
The intended use is for those sccs files that contain deltas that are so old that they are no longer use- 
ful. It is not recommended that comb be used as a matter of routine; its use should be restricted to a 
very small number of times in the life of an sccs file. 
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In the absence of any keyletters, com preserves only leaf deltas and the minimum number of ancestor 
deltas necessary to preserve the ‘‘shape’’ of the sccs file tree. The effect of this is to eliminate ‘tmid- 
dle’? deltas on the trunk and on all branches of the tree. Thus, in Figure 3, deltas 1.2, 1.3.2.1, 1.4, 
and 2.1 would be eliminated. Some of the keyletters are summarized as follows: 


The —p keyletter specifies the oldest delta that is to be preserved in the reconstruction. All older del- 
tas are discarded. 

The —c keyletter specifies a list (see ger(I) for the syntax of such a list) of deltas to be preserved. All 
other deltas are discarded. 


The —s keyletter causes the generation of a Shell procedure, which, when run, produces. only a report 
summarizing the percentage space (if any) to be saved by reconstructing each named sccs file. It is 
recommended that comb be run with this keyletter (in addition to any others desired) before any actual 
reconstructions. 


[It should be noted that the Shell procedure generated by comd is not guaranteed to save any space. In 
fact, it is possible for the reconstructed file to be /arger than the original. Note, too, that the shape of 
the sccs file tree may be altered by the reconstruction process. 


6. SCCS FILES 


This section discusses several topics that must be considered before extensive use is made of sccs. 
These topics deal with the protection mechanisms relied upon by sccs, the format of sccs files, and the 
recommended procedures for auditing SCCs files. 


6.1 Protection 


Sccs relies on the capabilities of the PWB/UNIX operating system for most of the protection mechanisms 
required to prevent. unauthorized changes to sccs files (i.e., changes made by non-Sccs commands). 
The only protection features provided directly by sccs are the release floor and ceiling flags, and the user 
list (see Section 5.1.3). 


New Sccs files created by the admin command are given mode 444 (read only). It is recommended that 
this mode nor be changed, as it prevents any direct modification of the files by non-sccs commands. It 
is further recommended that the directories containing sccs files be given mode 755, which allows only 
the owner of the directory to modify its contents. 


Sccs files should be kept in directories that contain only sccs files and any temporary files created by 
SCCS commands. This simplifies protection and auditing of sccs files (see Section 6.3). The contents 
of directories should correspond to convenient logical groupings, e.g., sub-systems of a large project. 


Sccs files must have only one link (name). The reason for this is that those commands that modify 
sccs files do so by creating a temporary copy of the file (called the x-file, see Section 4) and, upon com- 
pletion of processing, remove the old file and rename the x-file. If the old file has more than one link, 
removing it and renaming the x-file would break the link. Rather than process such files, sccs com- 
mands produce an error message. All sccs files must have names that begin with ‘‘s.”’. 


When only one user (or a group of users who share the same PWB/UNIX user identification number— 
user [D—see passwd(I)) uses sccs, the real and effective user IDs are the same, and that user ID owns 
the directories containing sccs files. In addition, when several users share the same user ID (even 
though they may have different /ogin names), all such users have identical file permissions. Therefore, 
SCCS may be used directly by any one of these users, without any preliminary preparation. 


However, there are situations (for example, in large software development projects) in which it is not 
practical to give the same user ID to all users of sccs. In these cases, one user (equivalently, one user 
{D) must be chosen as the ‘‘owner’’ of the sccs files and be the one who will ‘“‘administer’’ them (e.g., 
by using the admin command). This user is termed the SCCS administrator for that project. Because 
other users of sccs do not have the same privileges and permissions as the SCCS administrator, they are 
not able to execute directly those commands that require write permission in the directory containing 
the sccs files. Therefore, a project-dependent program is required to provide an interface to the get, 
delta, and, if desired, rmdel and chghist commands. 


~neee” 
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The interface program must be owned by the sccs administrator, and must have the 
set user ID on execution bit on (see chmod{(I)), so that the effective user ID is the user 1D of the adminis- 
trator. This program’s function is to invoke the desired SCCS command and to cause it to inherit the 
privileges of the interface program for the duration of that command’s execution. In this manner, the 
owner of an sccs file can modify it at will. Other users whose /ogin names are in the user /ist for that 
file (but who are vor its owners) are given the necessary permissions only for the duration of the execu- 
tion of the interface program, and are thus able to modify the sccs files only through the use of celta 
and, possibly, rmde/and chghist. The project-dependent interface program, as its name implies, must be 
custom-built for each project. 


6.2 Format 
Sccs files are composed of lines of ascii text’ arranged in six parts, as follows: 


Checksum A line containing the ‘“‘logical’’ sum of all the characters of the file (nor including 
this checksum itself). 
Deita Table Information about each deita, such as its type, its SID, date and time of creation, 


and commentary. 


User Names List of login names of users who are allowed to modify the file by adding or 
removing deltas. 


Flags Indicators that control certain actions of various SCCS commands. 


Descriptive Text Arbitrary text provided by the user; usually a summary of the contents and pur- 
pose of the file. 


Body Actual text that is being administered by SCCS, intermixed with internal sCcs con- 
trol lines. 


Detailed information about the contents of the various sections of the file may be found in sccsfile(V), 
the checksum is the only portion of the file which is of interest below. 


It is important, to note that because sccs files are ASCU files, they may be processed by various 
PWB/UNIX commands, such as ed(I), grep{I), and car{I). This is very convenient in those instances in 
which an sccs file must be modified manually (e.g., when the time and date of a delta was recorded 
incorrectly because the system clock was set incorrectly), or when it is desired to simply ‘‘look’’ at the 
file. 


mr =6Extreme care should be exercised when modifying SCCS files with non-SCCS commands. 


6.3 Auditing 


On rare occasions, perhaps due to an operating system or hardware malfunction, an sccs file, or por- 
tions of it (i.e., one or more ‘‘blocks’’) can be destroyed. Sccs commands (like most PWB/UNIX com- 
mands) issue an error message when a file does not exist. In addition, sCcCS commands use the check- 
sum stored in the sccs file to determine whether a file has been corrupted since it was last accessed (pos- 
sibly by having lost one or more blocks, or by having been modified with, for example, ed{I)). No 
SCCS command will process a corrupted sccs file except the admin command with the —h or —z 
keyletters, as described below. 


It is recommended that sccs files be audited (checked) for possible corruptions on a regular basis. The 
simplest and fastest way to perform an audit is to execute the admin command with the —h keyletter 
on all sccs files: 


admin —h s.filel s.file2 ... 
or 
admin —h directoryl directory2 ... 


7. Versions of SCCS up to and including Version 3 used non-ASCII files. Therefore, files created by earlier versions of SCCS 
are incompatible with Version 4 of SCCS. 


oe ee 


If the new checksum of any file is not equal to the checksum in the first line of that file, the message: 
corrupted file (co6) 


is produced for that file. This process continues until all the files have been examined. When examin- 
ing directories (as in the second example above), the process just described will not detect missing files. 
A simple way to detect whether any files are missing from a directory is to periodically execute the /s{I) 
command on that directory, and compare the outputs of the most current and the previous executions. 
Any file whose name appears in the previous oulput but not in the current one has been removed by 
some means. 


Whenever a file has been corrupted, the manner in which the file is restored depends upon the extent 
of the corruption. If damage is extensive, the best solution is to contact the local PWB/UNIX operations 
group to request a restoral of the file from a backup copy. In the case of minor damage, repair through 
use of the editor ed(I) may be possible. In the latter case, after such repair, the following command 
must be executed: 


admin —z s.file 
The purpose of this is to recompute the checksum to bring it into agreement with the actual contents of 
the file. After this command is executed on a file, any corruption which may have existed in that file 
will no longer be detectabie. 
REFERENCES 


{1] Ritchie, D. M., and Thompson, K. The UNIx Time-Sharing System. Comm. ACM 17(7):365- 
75, July 1974. 


[2} Kernighan, B. W. UNix for Beginners. Bell Laboratories, 1973. 
[3] Kernighan, B. W. A Tutorial Introduction to the UNIX Text Editor. Bell Laboratories, 1973. 


[4] Dolotta, T. A., Haight, R. C., and Piskorik, E. M., eds. Pwa/unix User’s Manual~Edition 1.0. 
Bell Laboratories, May 1977. 


[5] Kernighan, B. W., and Ritchie, D. M. UNLx Programming. Beil Laboratories, 1973. 


T.1 


ft. 7" . 
“ies, a i 
qa 


NROFF/TROFF User’s Manual 


Joseph F. Ossanna 


Bell Laboratories 
Murray Hill, New Jersey 07974 


Introduction 


NROFF and TROFF are text processors under the PDP-11 UNIX Time-Sharing System! that format text 
for typewriter-like terminals and for a Graphic Systems phototypesetter, respectively. They accept lines 
of text interspersed with lines of format control information and format the text into a printable, 
paginated document having a user-designed style. NROFF and TROFF offer unusual freedom in docu- 
ment styling, including: arbitrary style headers and footers; arbitrary style footnotes; multiple automatic 
sequence numbering for paragraphs, sections, etc; multiple column output; dynamic font and point-size 
control; arbitrary horizontal and vertical local motions at any point, and a family of automatic overstrik- 
ing, bracket construction, and line drawing functions. 


NROFF and TROFF are highly compatible with each other and it is almost always possible to prepare 
input acceptable to both. Conditional input is provided that enables the user to embed input expressly 
destined for either program. NROFF can prepare output directly for a variety of terminal types and is 
capable of utilizing the full resolution of each terminal. 


Usage 
The general form of invoking NROFF (or TROFF) at UNIX command level is 
nroff options files (or troff options files) 


where options represents any of a number of option arguments and files represents the list of files con- 
taining the document to be formatted. An argument consisting of a single minus (—) is taken to be a 
file name corresponding to the standard input. If no file names are given input is taken from the stan- 
dard input. The options, which may appear in any order so long as they appear before the files, are: 


Option Effect 


—olist Print only pages whose page numbers appear in /ist, which consists of comma- 
separated numbers and number ranges. A number range has the form N—M and 
means pages N through M; a initial —N means from the beginning to page N; and 
a final N— means from N to the end. 


—~nN Number first generated page N. 


—sN Stop every N pages. NROFF will halt prior to every N pages (default N=1) to 
allow paper loading or changing, and will resume upon receipt of a newline. 
TROFF will stop the phototypesetter every N pages, produce a trailer to allow 
changing cassettes, and will resume after the phototypesetter START button is 
pressed. 


—mname Prepends the macro file /usr/lib/tmac. name to the input /iles. 
—raN Register a (one-character) is set to WN. 

ceed | Read standard input after the input files are exhausted. 

—q Invoke the simultaneous input-output mode of the rd request. 
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NROFF Only 


—Tname Specifies the name of the output terminal type. Currently defined names are 37 
for the (default) Model! 37 teletype, tn300 for the GE TermiNet 300 (or any ter- 
minai without half-line capabilities), 300S for the DASI-300S, 300 for the DASI- 
300, and 450 for the DASI-450 (Diablo Hyterm). 


—e Produce equally-spaced words in adjusted lines, using full terminal resolution. 
TROFF Only 

—t Direct output to the standard output instead of the phototypesetter. 

~f Refrain from feeding out paper and stopping phototypesetter at the end of the run. 

—Ww Wait until phototypesetter is available, if currently busy. 

—b TROFF will report whether the phototypesetter is busy or available. No text pro- 
cessing is done. | 

a Send a printable (ASCII) approximation of the results to the standard output. 

—pN Print all characters in point size N while retaining all prescribed spacings and 
motions, to reduce phototypesetter elasped time. 

—g Prepare output for the Murray Hill Computation Center phototypesetter and direct 


it to the standard output. 


Each option is invoked as a separate argument; for example, 

nroff -04,8—/0 —T300S mabe filel file2 
requests formatting of pages 4, 8, 9, and 10 of a document contained in the files named /le/ and /le2, 
specifies the output terminal as a DASI-300S, and invokes the macro package adc. 


Various pre- and post-processors are available for use with NROFF and TROFF. These include the 
equation preprocessors NEQN and EQN? (for NROFF and TROFF respectively), and the table- 
construction preprocessor TBL?. A reverse-line postprocessor COL‘ is available for muitiple-column 
NROFF output on terminals without reverse-line ability; COL expects the Model 37 Teletype escape 
sequences that NROFF produces by default. TK* is a 37 Teletype simulator postprocessor for printing 
NROFF output on a Tektronix 4014. TCAT* is phototypesetter-simulator postprocessor for TROFF that 
produces an approximation of phototypesetter output on a Tektronix 4014. For example, in 
thi files | eqn | troff —t options | tcat 


the first | indicates the piping of TBL’s output to EQN’s input; the second the piping of EQN’s output to 
TROFF’s input; and the third indicates the piping of TROFF’s output to TCAT. GCAT* can be used to 
send TROFF (—g) output to the Murray Hill Computation Center. 


The remainder of this manual consists of: a Summary and Index; a Reference Manual keyed to the 
index; and a set of Tutorial Examples. Another tutorial is [5]. 


Joseph F. Ossanna 
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SUMMARY AND INDEX 


Request Initial If No 
Form Vailue* Argument Notes# Explanation 


1. General Explanation 
2. Font and Character Size Control 


pstN 10 point previous E Point size; also \s+N.f 

ss N 12/36em _ ignored E Space-character size set to N/36em.tf 

csFNM off - P Constant character space (width) mode (font F).t 

bd FN off ° - P Embolden font F by N—1 units.f 

bd S FN off : P Embolden Special Font when current font is F.f 

ft F Roman previous E Change to font F = x, x, or 1-4. Also \fx, \fCeez \fN. 
fp NF R,1,B,S ignored : Font named F mounted on physical position |<< N<4. 


3. Page Control 


pl+N Llin llin Vv Page length. 

bp tN N=} - Bt,v Eject current page; next page number N. 

pn +N Nal ignored - Next page number N. 

.po +N 0; 26/27in previous = v Page offset. — 

.ne NV - Nel V D.v Need A vertical space (VY = vertical spacing). 
.mk R none internal D Mark current vertical place in register R. 

Tt tN none internal D,v _—- Return (upward oniy) to marked vertical place. 
4. Text Filling, Adjusting, and Centering 

br - - B Break. 

fi fill - B,E Fill output lines. 

nf fill - BE No filling or adjusting of output lines. 

ad C adj, both adjust E Adjust output lines with mode c 

na adjust - E No output line adjusting. 

ce N off Neen} B,E Center following NV input text lines. 

5. Vertical Spacing 

vs NV 1/6in;12pts previous E,p Vertical base line spacing (V). 

ws N Ne] previous E Output N—1 Vs after each text output line. 
sp VN - Nei V B,v Space vertical distance NV in either direction. 
sv N - Nel V v Save vertical distance NV. 

0S - - - Output saved vertical distance. 

ns space - D Turn no-space mode on. 

TS - - D Restore spacing; turn no-space mode off. 

6. Line Length and Indenting 

Han 6.5 in previous E,m _ Line length. 

in +N N=Q previous B,E,m Indent. 
ti tN - ignored B,E,m Temporary indent. 

7. Macros, Strings, Diversion, and Position Traps 

dexyy - JY ™., - Define or redefine macro -cx; end at call of yy. 
am xyy - Jy™.. - Append to a macro. 

.dS xx string - ignored - Define a string oc containing string. 

aS XX String - ignored - Append svring to string 20. 


*Values separated by *;” are for NROFF and TROFF respectively. 

#Notes are explained at the end of this Summary and Index 

TNo effect in NROFF. 

The use of *” " as control character (instead of *.") suppresses the break function. 
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Request Initial If No 

Form Value Argument Notes Explanation 

rm xx - ignored - Remove request, macro, or string. 

muyy ignored - Rename request, macro, or string xx to yy. 

di x - end D Divert output to macro x 

da x : end D Divert and append to «<x. 

whNxx se - v Set location trap; negative is w.r.t. page bottom. 
chroxvN - - v Change trap location. 

dt N x - off D,y _—_ Set a diversion trap. 

it N xx . off E Set an input-line count trap. 

em Xxx none none . End macro is -. 

8. Number Registers 

onRtNM - u Define and set number register R; auto-increment by M. 
af Re arabic - - Assign format to register R (c™1, i, I, a, A). 
wrR - ot . - Remove register R. 

9. Tabs, Leaders, and Fields 

ta Nr...  0.8:0.5in none E,m_ Tab settings; /e/f type, unless ;=R (right), C(centered). 
tec none none E Tab repetition character. 

lec , none E Leader repetition character. 

fead off off - Set field delimiter @ and pad character bd. 

10. Input and Output Conventions and Character Translations 

ec c \ \ - Set escape character. 

eo on - - Turn off escape character mechanism. 

jg NV -,on on - Ligature mode on if N>0. 

ul V off Na} E Underline (italicize in TROFF) NV input lines. 

cu V off N=} E Continuous underline in NROFF; like ul in TROFF. 
uf F Italic Italic - Underline font set to F (to be switched to by ul). 
ce c : : E Set control character to c. 

.c2 ¢ , : E Set nobreak control character to c. 

tr abed.... none - O Translate gto 6, etc. on output. 


11. Local Horizontal and Vertical Motions, and the Width Function 
12. Overstrike, Bracket, Line-drawing, and Zero-width Functions 
13. Hyphenation. 


nh hyphenate - E No hyphenation. 

why V hyphenate hyphenate E Hyphenate, NV = mode. 

-he c \% \% E Hyphenation indicator character c. 
whw word ... ignored - Exception words. 


14. Three Part Titles. 


wth left’ center’ right’ - : Three part title. 


spe c % off : Page number character. 

jt iN 6.5in previous E,m Length of title. 

15. Output Line Numbering. 

mamtzNMSI off E Number mode on or off, set parameters. 
an NV - Nx | E Do not number next N lines. 


16. Conditional Acceptance of Input 


if c anything 


*» 


If condition c true, accept anyrhing as input, 
for multi-line use \{anyrthing\}. 
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Request Initial lf No 
Form Value Argument Notes Explanation 


if !c anything - - If condition c false, accept anything. 

if N anything - u If expression NV > 0, accept anything. 

Af !N anything - u If expression V < 0, accept anything. 

if ‘stringl‘ string2’ anything - If string/ identical to siring2, accept anything. 

.if !"stringl’ string?’ anything If string] not identical to string2, accept anything. 


é 


.le c anything - u If portion of if-else; all above forms (like if). 
el anything - - Else portion of if-else. 
17. Environment Switching. 
ev N Nw) previous - Environment switched (push down). 
18. Insertions from the Standard Input 
rd prompt - prompt =BEL - Read insertion. 
.eXx - - : Exit from NROFF/TROFF. 
19. Input/Output File Switching 
.so filename - : Switch source file (push down). 
nx filename end-of-file - Next file. 
pi program - - Pipe output to program (NROFF only). 
20. Misceilaneous 
mec NV . off E,m Set margin character c and separation WN. 
.tm string - newline - Print string on terminal (UNIX standard message output). 
ig yy ° JY 0 - Ignore till call of yy. 
pm f - all - Print macro names and sizes; 
if t present, print only total of sizes. 
fl - : B Flush output buffer. 


21. Output and Error Messages 


Notes- 


B Request normaily causes a break. 

D Mode or relevant parameters associated with current diversion level. 

E . Relevant parameters are a part of the current environment. 

Q Must stay in effect until logical output. 

P $$ Mode must be still or again in effect at the time of physical output. 
v,p,m,u Default scale indicator; if not specified, scale indicators are ignored. 


Alphabetical Request and Section Number Cross Reference 


4 cc 10 ds 7 fe 9 ie 16 i 6 nh 13 pi 19 tm 7 ia 9 vs § 
8 4 dt 7 fi 4 if 16 Is § am 15 pi 3 tr .8 tc 9 wh 7 
7 7 ec 10 fl 20 ig 20 ik 14 an 15 pm 20 rs § ti 6 
7 2 el 16 fp 2 in 6 me 20 nr 8 pn 3 rt 3 ue i4 
bd 2 cu 10 em 7 ft 2 it 7 mk 3 ns 5 po 3 so 19 tm 20 
3 7 eo 10 he 13 le 9 na 4 nx 19 ps 2 sp 5 tr 10 
4 7 ev 17 hw 13 lg 10 ne 3 os 5 rd 18 ss 2 uf 10 
0 7 ex 18 hy 13 li 10 nf 4 pe 14 rm 7 sv § ul 10 
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Escape Sequences for Characters, Indicators, and Functions 


Section Escape 


Reference Sequence Meaning 
10.1 \\ \ (to prevent or delay the interpretation of \) 
10.1 \e Printable version of the current escape character. 
2.1 \’ ’ (acute accent); equivalent to \(aa 
2.1 \ * (grave accent); equivalent to \(ga 
2.1 \~= — Minus sign in the current font 
7 . Period (dot) (see de) 
11.1 \ (space) Unpaddable space-size space character 
11.1 \0 Digit width space 
11.1 \| 1/6em narrow space character (zero width in NROFF) 
Ll.1 \° 1/12em half-narrow space character (zero width in NROFF) 
4.1 \& Non-printing, zero width character 
10.6 \! Transparent line indicator 
10.7 \" Beginning of comment 
7.3 \SN interpolate argument 1<N<9 
13 \% Default optional hyphenation character 
2.1 \ Gere Character named «<x 
7.1 \ex, \*(ex Interpolate string x or xx 
9.1 \a Non-interpreted leader character 
12.3 \b’ abc...’ Bracket building function 
4.2 \e Interrupt text processing 
11.1 \d Forward (down) 1/2em vertical motion (1/2 line in NROFF) 
2.2 \fic\fCen\f£N Change to font named x or <x, or position NV 
11.1 \h'N’ Local horizontal motion; move right N (negative left) 
11.3 \kx Mark horizontal input piace in register x 
12.4 \I' Ne’ Horizontal line drawing function (optionally with c) 
12.4 \L’ Ne’ Vertical line drawing function (optionally with c) 
8 \nx, \n Cex Interpolate number register x or xx 
12.1 \o’ abc...’ Overstrike characters a, 0, ¢, ... 
4.1 \p Break and spread output line 
11.1 \r Reverse | em vertical motion (reverse line in NROFF) 
23 \sN,\szN  Point-size change function 
9.1 \t Non-interpreted horizontal tab 
11.1 \u Reverse (up) 1/2em vertical motion (1/2 line in NROFF) 
11.1 \vN’ Local vertical motion; move down N (negative up) 
11.2 \w’ string’ Interpolate width of string 
5.2 \x' N’ Extra line-space function (negative before, positive after) 
12.2 \zc Print c with zero width (without spacing) 
16 \{ Begin conditional input 
16 \} End conditional input 
10.7 \(newline) Concealed (ignored) newline 
- \X X, any character nor listed above 


The escape sequences \\, \., \", \S, \*, \a, \n, \t, and \(newline) are interpreted in copy mode (§7.2). 
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Predefined General Number Registers 


Section 
Reference 


3 
11.2 

7.4 

7.4 


Register 


Name 


Description 


Current page number. 

Character type (set by width function). 

Width (maximum) of last completed diversion. 

Height (vertical size) of last completed diversion. 

Current day of the week (1-7). 

Current day of the month (1-31). 

Current horizontal place on input line. 

Output line number. 

Current month (1-12). 

Vertical position of last printed text base-line. 

Depth of string below base line (generated by width function). 
Height of string above base line (generated by widt/ function). 
Last two digits of current year. 


Predefined Read-Only Number Registers 


Section 
Reference 


7.3 


it 


hae e tn Go 


LMR INWW EADS! 


"he 
‘> 


Register 


Name 


Nek eg <b bDoE HHS Hed be CHEE 


Description 


Number of arguments available at the current macro level. 
Set to | in TROFF, if a option used; always | in NROFF. 
Available horizontal resolution in basic units. 

Set to | in NROFF, if —T option used: always 0 in TROFF. 
Available vertical resolution in basic units. 

Post-line extra line-space most recently utilized using \x’ NV’. 
Number of /ines read from current input file. 

Current vertical place in current diversion; equal to ni, if no diversion. 
Current font as physical quadrant (1-4). 

Text base-line high-water mark on current page or diversion. 
Current indent. 

Current line length. 

Length of text portion on previous output line. 

Current page offset. 

Current page length. 

Current point size. 

Distance to the next trap. 

Equal to | in fill mode and 0 in nofill mode. 

Current vertical line spacing. 

Width of previous character. 

Reserved version-dependent register. 

Reserved version-dependent register. 

Name of current diversion. 
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REFERENCE MANUAL 


1. General Explanation 


1.1. Form of input. Input consists of text lines, which are destined to be printed, interspersed with control 
lines, which set parameters or otherwise control subsequent processing. Control lines begin with a con- 
trol character—normally . (period) or ° (acute accent)~followed by a one or two character name that 
specifies a basic request or the substitution of a user-defined macro in place of the control line. The 
control character ° suppresses the break function—the forced output of a partially filled line—caused by 
certain requests. The control character may be separated from the request/macro name by white space 
(spaces and/or tabs) for esthetic reasons. Names must be followed by either space or newline. Control 
lines with unrecognized names are ignored. 


Various special functions may be introduced anywhere in the input by means of an escape character, 
normally \. For example, the function \n& causes the interpolation of the contents of the number regis- 
ter R in place of the function; here R is either a single character name as in \nx, or left-parenthesis- 
introduced, two-character name as in \n(-x<x. 


1.2. Formatter and device resolution. TROFF internally uses 432 units/inch, corresponding to the Graphic 
Systems phototypesetter which has a horizontal resolution of 1/432 inch and a vertical resolution of 
1/144 inch. NROFF internaily uses 240 units/inch, corresponding to the least common multiple of the 
horizontal and vertical resolutions of various typewriter-like output devices. TROFF rounds 
horizontal/vertical numerical parameter input to the actual horizontal/vertical resolution of the Graphic 
Systems typesetter. NROFF similarly rounds numerical input to the actual resolution of the output dev- 
ice indicated by the —T option (default Model 37 Teletype). 


1.3. Numerical parameter input. Both NROFF and TROFF accept numerical input with the appended scale 
indicators shown in the following table, where S is the current type size in points, V is the current verti- 
cal line spacing in basic units, and Cis a nominal character width in basic units. 


Scale Number of basic units 
Indicator Meaning TROFF NROFF 

i Inch 432 240 
Centimeter 432x50/127 | 240x50/127 


Pica = 1/6 inch 240/6 


Em = S points C 

En = Em/2 C, same as Em 
Point = 1/72 inch 240/72 

Basic unit l 

Vertical line space V 

Defauit, see beloy 


In NROFF, doth the em and the en are taken to be equal to the C, which is output-device dependent; 
common values are 1/10 and 1/12 inch. Actual character widths in NROFF need not be all the same 
and constructed characters such as ~> (—) are often extra wide. The default scaling is ems for the 
horizontally-oriented requests and functions ll, in, ti, ta, It, po, me, \h, and \l; Vs for the vertically- 
oriented requests and functions pl, wh, ch, dt, sp, sv, ne, rt, \v, \x, and \L; p for the ys request; and 
u for the requests nr, if, and ie. 4/! other requests ignore any scale indicators. When a number regis- 
ter containing an already appropriately scaled number is interpolated to provide numerical input, the 
unit scale indicator u may need to be appended to prevent an additional inappropriate default scaling. 


) 
~8- 
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The number, N, may be specified in decimal-fraction form but the parameter finally stored is rounded 
to an integer number of basic units. 


The adsolute position indicator | may be prepended to a number AN to “enerate the distance to the vertical 
or horizontal place N. For vertically-oriented requests and functions, | becomes the distance in basic 
units from the current vertical place on the page or in a diversion ($7.4) to the the vertical place N. For 
all other requests and functions, | N becomes the distance from the current horizontal place on the input 
line to the horizontal place NV. For example, 


sp |3.2¢ 
will space in the required direction to 3.2 centimeters from the top of the page. 


1.4. Numerical expressions. Wherever numerical input is expected an expression involving parentheses, 
the arithmetic operators +, —, /, ©, % (mod), and the logical operators <, >, <=, >=, = (or ==), 
& (and), : (or) may be used. Except where controlled by parentheses, evaluation of expressions is 
left-to-right; there is no operator precedence. In the case of certain requests, an initial + or — is 
stripped and interpreted as an increment or decrement indicator respectively. In the presence of default 
scaling, the desired scale indicator must be attached to every number in an expression for which the 
desired and default scaling differ. For example, if the number register x contains 2 and the current 
point size is 10, then 


WM (4.2Si+\nxP+3)/2u 
will set the line length to 1/2 the sum of 4.25 inches + 2 picas + 30 points. 


1.5. Notation. Numerical parameters are indicated in this manual in two ways. +N means that the 
argument may take the forms N, +N, or —N and that the corresponding effect is to set the affected 
parameter to NV, to increment it by N, or to decrement it by N respectively. Plain NM means that an ini- 
tial algebraic sign is nof an increment indicator, but merely the sign of N. Generally, unreasonable 
numerical input is either ignored or truncated to a reasonable value. For example, most requests 
expect to set parameters to non-negative values; exceptions are sp, wh, ch, nr, and if. The requests 
ps, ft, po, vs, Is, lI, in, and It restore the previous parameter value in the absence of an argument. 


Single character arguments are indicated by single lower case letters and one/two character arguments 
are indicated by a pair of lower case letters. Character string arguments are indicated by multi-character 
mnemonics. 


2. Font and Character Size Control 


2.1. Character set. The TROFF character set consists of the Graphics Systems Commercial II character 
set plus a Special Mathematical Font character set—each having 102 characters. These character sets 
are shown in the attached Table I. All ASCII characters are included, with some on the Special Font. 
With three exceptions, the ASCII characters are input as themselves, and non-ASCII! characters are input 
in the form \(xx where xx is a two-character name given in the attached Table I]. The three ASC]! 
exceptions are mapped as follows: 


ASCII Input Printed by TROFF 
Character Name Character Name 

acute accent close quote 
grave accent ’ open quote 
minus hyphen 


The characters *, ", and — may be input by \’, \‘, and \— respectively or by their names (Table I{). 
The ASCII characters @, #, °, °, °, <, >, \, (, ], 7, *, and _ exist only on the Special Font and are 
printed as a l-em space if that Font is not mounted. 


NROFF understands the entire TROFF character set, but can in general print only ASCII characters, 
additional characters as may be available on the output device, such characters as may be able to be 
constructed by overstriking or other combination, and those that can reasonably be mapped into other 
printable characters. The exact behavior is determined by a driving table prepared for each device. The 
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characters °, ‘, and _ print as themselves. 


2.2. Fonts. The default mounted fonts are Times Roman (R), Times Italic (1), Times Bold (B), and 
the Special Mathematical Font (S) on physical typesetter positions 1, 2, 3, and 4 respectively. These 
fonts are used in this document. The current font, initially Roman, may be changed (among the 
mounted fonts) by use of the ft request, or by imbedding at any desired point either \fx, \f(cx, or \fNV 
where x and xx are the name of a mounted font and N is a numerical font position. It is not necessary 
to change to the Special font; characters on that font are automatically handled. A request for a named 
but not-mounted font is ignored.) TROFF can be informed that any particular font is mounted by use of 
the fp request. The list of known fonts is installation dependent. In the subsequent discussion of 
font-related requests, F represents either a one/two-character font name or the numerical font position, 
1-4. The current font is available (as numerical position) in the read-only number register .f. 


NROFF understands font control and normally underlines Italic characters (see $10.5). 


2.3. Character size. Character point sizes available on the Graphic Systems typesetter are 6, 7, 8, 9, 10, 
11, 12, 14, 16, 18, 20, 22, 24, 28, and 36. This is a range of 1/12 inch to 1/2 inch. The ps request is 
used to change or restore the point size. Alternatively the point size may be changed between any two 
characters by imbedding a \sN at the desired point to set the size to N, or a \stN (1<N<9) to 
increment/decrement the size by NV; \sO0 restores the previous size. Requested point size values that are 
between two valid sizes yield the larger of the two. The current size is available in the .s register. 
NROFF ignores type size control. 


Request Initial If No 
Form Value Argument Notes* Expianation 


ps +N 10 point previous E Point size set to +N. Alternatively imbed \s/N or \s+N. 
Any positive size value may be requested; if invalid, the 
next larger valid size will result, with a maximum of 36. 
A paired sequence +N, —N will work because the previ- 
ous requested value is also remembered. Ignored in 


NROFF. 

ss N 12/36em ignored E Space-character size is set to N/36ems. This size is the 
minimum word spacing in adjusted text. Ignored in 
NROFF. 

csFNM off - P Constant character space (width) mode is set on for font 


F (if mounted); the width of every character will be 
taken to be N/36 ems. If Mis absent, the em is that of 
the character’s point size; if © is given, the em is M- 
points. All affected characters are centered in this space, 
including those with an actual width larger than this 
space. Special Font characters occurring while the 
current font is F are also so treated. [f N is absent, the 
mode is turned off. The mode must be still or again in 
effect when the characters are physically printed. Ignored 
in NROFF. 


.bd FN off - P The characters in font F will be artificially emboldened by 
printing each one twice, separated by N—1 basic units. A 
reasonable value for N is 3 when the character size is in 
the vicinity of 10 points. If N is missing the embolden 
mode is turned off. The column heads above were 
printed with .bd 13. The mode must be still or again in 
effect when the characters are physically printed. Ignored 
in NROFF. 


“Notes are explained at the end of the Summary and Index above. 
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bdS FN off : P The characters in the Special Font will be emboldened 
whenever the current font is & This manual was printed 
with .bdSB3. The mode must be still or again in effect 

when the characters are physically printed. 


ft F Roman previous E Font changed to F. Alternatively, imbed \fF. The font 
name P is reserved to mean the previous font. 
fp NF R,1,B,S ignored . Font position. This is a statement that a font named F is 


mounted on position N (1-4). It is a fatal error if F is 
not known. The phototypesetter has four fonts physically 
mounted. Each font consists of a film strip which can be 
mounted on a numbered quadrant of a wheel. The 
default mounting sequence assumed by TROFF is R, I, B, 
and S on positions |, 2, 3 and 4. 


3. Page control 


Top and bottom margins are nof automatically provided: it is conventional to define two macros and to 
set traps for them at vertical positions 0 (top) and —N (N from the bottom). See §7 and Tutorial 
Examples §T2. A pseudo-page transition onto the first page occurs either when the first break occurs or 
when the first non-diverted text processing occurs. Arrangements for a trap to occur at the top of the 
first page must be completed before this transition. In the following, references to the current diversion 
(§7.4) mean that the mechanism being described works during both ordinary and diverted output (the 
former considered as the top diversion level). | 


The useable page width on the Graphic Systems phototypesetter is about 7.54 inches, beginning about 
1/27 inch from the left edge of the 8 inch wide, continuous roll paper. The physical limitations on 
NROFF output are output-device dependent. 


Request Initial Lf No 
Form Value Argument Notes Explanation 


pl aN llin Llin v Page length set to +N. The internal limitation is about 
75 inches in TROFF and about 136 inches in NROFF. 
The current page length is availabie in the .p register. 


bp +N Naw | ° _ B*,v Begin page. The current page is ejected and a new page 
is begun. If +A is given, the new page number will be 
+N. Also see request ns. 


.po+tN N=] ignored - Page number. The next page (when it occurs) will have 
the page number +N. A pn must occur before the ini- 
tial pseudo-page transition to effect the page number of 
the first page. The current page number is in the % 
register. 


po +N 0; 26/27 int previous Vv Page offset. The current /eft margin is set to +N. The 
TROFF initial value provides about | inch of paper mar- 
gin including the physical typesetter margin of 1/27 inch. 
‘In TROFF the maximum (line-length) + (page-offset) is 
about 7.54 inches. See §6. The current page offset is 
available in the .o register. 


ene NV - Nw] V D,v Need N vertical space. If the distance, D, to the next 
trap position (see §7.5) is less than N, a forward vertical 
space of size D occurs, which will spring the trap. If 
there are no remaining traps on the page, D is the 


*The use of ”° " as control character (instead of *.") suppresses the break function. 
t Values separated by ";” are for NROFF and TROFF respectively. 
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distance to the bottom of the page. If D< V, another 
line could still be output and spring the trap. In a diver- 
sion, D is the distance to the diversion trap, if any, or is 
very large. 


.mk R none internal D Mark the current vertical place in an internal register 
(both associated with the current diversion level), or in 
register R, if given. See rt request. 


rttN none internal D,vy Return upward only to a marked vertical place in the 
current diversion. If +N (w.r.t. current place) is given, 
the place is +N from the top of the page or diversion or, 
if Nis absent, to a place marked by a previous mk. Note 
that the sp request (§5.3) may be used in all cases 
instead of rt by spacing to the absolute place stored in a 
explicit register, e.g. using the sequence .mk RP .. 
»sp |\nRu. 


4. Text Filling, Adjusting, and Centering 


4.1. Filling and adjusting. Normally, words are collected from input text lines and assembled into a out- 
put text line until some word doesn’t fit. An attempt is then made the hyphenate the word in effort to 
assembie a part of it into the output line. The spaces between the words on the output line are then 
increased to spread out the line to the current /ine length minus any current indent. A word is any string 
of characters delimited by the space character or the beginning/end of the input line. Any adjacent pair 
of words that must be kept together (neither split across output lines nor spread apart in the adjustment 
process) can be tied together by separating them with the unpaddadle space character "\ " (backslash- 
space). The adjusted word spacings are uniform in TROFF and the minimum interword spacing can be 
controlled with the ss request (§2). In NROFF, they are normally nonuniform because of quantization 
to character-size spaces; however, the command line option —e causes uniform spacing with fuil output 
device resolution. Filling, adjustment, and hyphenation ($13) can all be prevented or controiled. The 
text length on the last line output is available in the .n register, and text base-line position on the page 
for this line is in the nl register. The text base-line high-water mark (lowest place) on the current page 
is in the .h register. 


An input text line ending with ., ?, or ! is taken to be the end of a sentence, and an additional space 
character is automatically provided during filling. Multiple inter-word space characters found in the 
input are retained, except for trailing spaces; initial spaces also cause a break. 


When filling is in effect, a \p may be imbedded or attached to a word to cause a break at the end of the 
word and have the resulting output line spread out to fill the current line length. 


A text input line that happens to begin with a control character can be made to not look like a control 
line by prefacing it with the non-printing, zero-width filler character \&. Still another way is to specify 
output translation of some convenient character into the control character using tr (310.5). 


4.2. Interrupted text. The copying of a input line in nofiil (non-fill) mode can be interrupted by terminat- 
ing the partial line with a \c. The next encountered input text line will be considered to be a continua- 
tion of the same line of input text. Similarly, a word within filled text may be interrupted by terminat- 
ing the word (and line) with \c; the next encountered text will be taken as a continuation of the inter- 
rupted word. If the intervening control lines cause a break, any partial line will be forced out along 
with any partial word. 


Request Initial [f No 
Form Value Argument Notes Explanation 


br * > B Break. The filling of the line currently being collected is 
stopped and the lItne is output without adjustment. Text 
lines beginning with space characters and empty text 
lines (blank lines) also cause a break. 
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fi fill on - B,E ‘Fill subsequent output lines. The register .u is 1 in fill 
mode and 0 in nofiil mode. 


nf fill on - B.E Nofill. Subsequent output lines are either filled zor 
adjusted. Input text lines are copied directly to output 
lines without regard for the current line length. 


ead c adj,both adjust E Line adjustment is begun. If fill mode is not on, adjust- 
ment will be deferred until fill mode is back on. If the 
type indicator c is present, the adjustment type is 
changed as shown in the following table. 


Adjust Type 


adjust left margin only 
adjust right margin only 
center 

adjust both margins 
unchanged 


na adjust - E Noadjust. Adjustment is turned off; the right margin will 
be ragged. The adjustment type for ad is not changed. 
Output line filling still occurs if fill mode is on. 


.ce N off N= B,E Center the next NV input text lines within the current 
(tine-length minus indent). If N=0, any residual count 
is cleared. A break occurs after each of the NV input 
lines. If the input line is too long, it will be left adjusted. 


5. Vertical Spacing 


5.1. Base-line spacing. The vertical spacing (V) between the base-lines of successive output lines can be 
set using the vs request with a resolution of 1/144inch = 1/2 point in TROFF, and to the output device 
resolution in NROFF. V must be large enough to accommodate the character sizes on the affected out- 
put lines. For the common type sizes (9-12 points), usual typesetting practice is to set V to 2 points 
greater than the point size; TROFF default is 10-point type on a 12-point spacing (as in this document). 
The current V is available in the .v register. Multiple-V line separation (e.g. double spacing) may be 
requested with Is. 


5.2. Extra line-space. If a word contains a vertically tall construct requiring the output line containing it 
to have extra vertical space before and/or after it, the extra-line-space function \x’ N° can be imbedded 
in or attached to that word. In this and other functions having a pair of delimiters around their parame- 
ter (here °), the delimiter choice is arbitrary, except that it can’t look like the continuation of a number 
expression for N. If N is negative, the output line containing the word will be preceded by NV extra 
vertical space; if N is positive, the output line containing the word will be followed by N extra vertical 
space. If successive requests for extra space apply to the same line, the maximum values are used. 
The most recently utilized post-line extra line-space is available in the .a register. 


5.3. Blocks of vertical space. A biock of vertical space is ordinarily requested using sp, which honors the 
no-space mode and which does not space past a trap. A contiguous block of vertical space may be 
reserved using sv. 


Request Initial lf No 

Form Value Argument Notes Explanation 

vs N 1/6in;12pts previous E,p Set vertical base-line spacing size V. Transient exrra 
vertical space available with \x’ N° (see above). 

wis V N=} previous E Line spacing set to +N. N=—1 Vs (blank lines) are 


appended to each output text line. Appended blank lines 
are omitted, if the text or previous appended blank line 
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reached a trap position. 


sp NV - Newel V B,v Space vertically in e/ther direction. If N is negative, the 
motion is backward (upward) and is limited to the dis- 
tance to the top of the page. Forward (downward) 
motion is truncated to the distance to the nearest trap. If 
the no-space mode is on, no spacing occurs (see ns, and 
rs below). 


sv N - Nel ¥ v Save a contiguous vertical biock of size N. If the dis- 
tance to the next trap ts greater than N, AN vertical space 
is output. No-space mode has vo effect. If this distance 
is less than AN, no vertical space is immediately output, 
but NV is remembered for later output (see os). Subse- 
quent sv requests will overwrite any still remembered N. 


.OS - - - Output saved vertical space. No-space mode has zo 
effect. Used to finally output a block of vertical space 
requested by an earlier sv request. 


.AS space : D No-space mode turned on. When on, the no-space mode 
inhibits sp requests and bp requests without a next page 
number. The no-space mode is turned off when a line of 
Outpul occurs, or with rs. 


rs space : D Restore spacing. The no-space mode is turned off. 
Blank text line. : B Causes a break and output of a blank line exactly like 
sp 1. 


6. Line Length and Indenting 


The maximum line length for fill mode may be set with Il. The indent may be set with in, an indent 
applicable to only the next output line may be set with ti. The line length includes indent space but zor 
page offset space. The line-length minus the indent is the basis for centering with ce. The effect of Il, 
in, or ti is delayed, if a partially collected line exists, until after that line is output. In fill mode the 
length of text on an output line is less than or equal to the line length minus the indent. The current 
line length and indent are available in registers .] and .i respectively. The length of three-part tities pro- 
duced by tl (see §14) is independently set by It. 


Request Initial If No 


Form Value Argument Notes Explanation 

HN 6.5in previous E,m Line length is set to =N. In TROFF the maximum 
(line-length) + (page-offset) is about 7.54 inches. 

in +N N=(Q previous B,E,m Indent is set to +N. The indent is prepended to each 
output line. 

tit : ignored B,E,m Temporary indent. The next output text line will be 


indented a distance +N with respect to the current 
indent. The resulting total indent may not be negative. 
The current indent is not changed. 


7. Macros, Strings, Diversion, and Position Traps 


7.1. Macros and strings. A macro is a named set of arbitrary /ines that may be invoked by name or with 
a trap. A String is a named string of characters, not including a newline character, that may be interpo- 
lated by name at any point. Request, macro, and string names share the same name list. Macro and 
string names may be one or two characters long and may usurp previously defined request, macro, or 
string names. Any of these entities may be renamed with rn or removed with rm. Macros are created 
by de and di, and appended to by am and da; di and da cause normal output to be stored in a macro. 
Strings are created by ds and appended to by as. A macro is invoked in the same way as a request: a 


+ ote, 
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control line beginning ..xx will interpolate the contents of macro 2. The remainder of the line may 
contain up to nine arguments. The strings x and x2x are interpolated at any desired point with \*x and 
\e(ser respectively. String references and macro invocations may be nested. 


7.2. Copy mode input interpretation. During the definition and extension of strings and macros (not by 
diversion) the input is read in copy mode. The input is copied without interpretation excepr that: 


¢ The contents of number registers indicated by \n are interpolated. 

e Strings indicated by \« are interpolated. 

¢ Arguments indicated by \$ are interpolated. 

e Concealed newlines indicated by \(newline) are eliminated. 

« Comments indicated by \" are eliminated. 

e \t and \a are interpreted as ASCII horizontal tab and SOH respectively (§9). 
e \\ is interpreted as \. 

e \. is interpreted as ".”. 


These interpretations can be suppressed by prepending a \. For example, since \\ maps into a \, \\n 
will copy as \n which will be interpreted as a number register indicator when the macro or string is 
reread. 


7.3. Arguments. When a macro is invoked by name, the remainder of the line is taken to contain up to 
. Mine arguments. The argument separator is the space character, and arguments may be surrounded by 
double-quotes to permit imbedded space characters. Pairs of double-quotes may be imbedded in 
double-quoted arguments to represent a single double-quote. If the desired arguments won’t fit on a 
line, a concealed newline may be used to continue on the next line. 


When a macro is invoked the inpur level is pushed down and any arguments available at the previous 
level become unavailable until the macro is completely read and the previous level is restored. A 
macro’s own arguments can be interpolated at any point within the macro with \$N, which interpolates 
the Nth argument (1<N<9). If an invoked argument doesn’t exist, a null string results. For exam- 
ple, the macro xx may be defined by 


de xx \"begin definition 
Today is \\$1 the \\$2. 
\"end definition 


and called by 
.xx Monday 14th 
to produce the text 
Today is Monday the 14th. 


Note that the \$ was concealed in the definition with a prepended \. The number of currently available 
arguments is in the .§ register. 


No arguments are available at the top (non-macro) level in this implementation. Because string 
referencing is implemented as a input-leve!l push down, no arguments are available from within a string. 
No arguments are available within a trap-invoked macro. 


Arguments are copied in copy mode onto a stack where they are available for reference. The mechan- 
ism does not allow an argument to contain a direct reference to a Jong string (interpolated at copy time) 
and it is advisable to conceal string references (with an extra \) to delay interpolation until argument 
reference time. 


7.4. Diversions. Processed output may be diverted into a macro for purposes such as footnote processing 
(see Tutorial §T5) or determining the horizontal and vertical size of some text for conditional changing 
of pages or columns. A single diversion trap may be set at a specified vertical position. The number 
registers dn and dl respectively contain the vertical and horizontal size of the most recently ended 
diversion. Processed text that is diverted into a macro retains the vertical size of each of its lines when 
reread in nofill mode regardless of the current V. Constant-spaced (cs) or emboldened (bd) text that is 
diverted can be reread correctly only if these modes are again or still in effect at reread time. One way 
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to do this is to imbed in the diversion the appropriate cs or bd requests with the transparent mechanism 
described in §10.6. 


Diversions may be nested and certain parameters and registers are associated with the current diversion 
level (the top non-diversion level may be thought of as the Oth diversion level). These are the diver- 
sion trap and associated macro, no-space mode, the internally-saved marked place (see mk and rt), the 
current vertical place (.d register), the current high-water text base-line (.h register), and the current 
diversion name (.z register). 


7.5. Traps. Three types of trap mechanisms are available—page traps, a diversion trap, and an input- 
line-count trap. Macro-invocation traps may be planted using wh at any page position including the top. 
This trap position may be changed using ch. Trap positions at or below the bottom of the page have no 
effect unless or until moved to within the page or rendered effective by an increase in page length. 
Two traps may be planted at the same position only by first planting them at different positions and 
then moving one of the traps; the first planted trap will conceal the second unless and until the first one 
is moved (see Tutorial Examples §T5). If the first one is moved back, it again conceals the second 
trap. The macro associated with a page trap is automatically invoked when a line of text is output 
whose vertical size reaches or sweeps past the trap position. Reaching the bottom of a page springs the 
top-of-page trap, if any, provided there is a next page. The distance to the next trap position is avail- 
able in the .t register; if there are no traps between the current position and the bottom of the page, the 
distance returned is the distance to the page bottom. 


A macro-invocation trap effective in the current diversion may be planted using dt. The .t register 
works in a diversion; if there is no subsequent trap a /arge distance is returned. For a description of 
input-line-count traps, see it below. 


Request Initial If No 
Form Value Argument Notes Explanation 


dexyy - yy. - Define or redefine the macro «<. The contents of the 
macro begin on the next input line. Input lines are 
copied in copy mode until the definition is terminated by a 
line beginning with .yy, whereupon the macro yy is 
called. In the absence of yy, the definition is terminated 
by a line beginning with "..". A macro may contain de 
requests provided the terminating macros differ or the 


contained definition terminator is concealed. ".." can be 
concealed as \\.. which will copy as \.. and be reread as 


ae i« 


am xxyy ss JY ™.. - Append to macro (append version of de). 
.dS xx String - ignored - Define a string «x containing string. Any initial double- 
quote in string is stripped off to permit initial blanks. 
8S XX String - ignored - Append string to string 2x (append version of ds). 
Im xx : ignored - Remove request, macro, or string. The name xx is 
AS removed from the name list and any related storage 


space is freed. Subsequent references will have no effect. 


In xyy ignored - Rename request, macro, or string xx to yy. If yy exists, it 
is first removed. 


di xx - end D Divert output to macro xx. Normal text processing 
occurs during diversion except that page offsetting is not 
done. The diversion ends when the request di or da is 
encountered without an argument; extraneous requests 
of this type should not appear when nested diversions are 
being used. 
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.da xx - end D Divert, appending to «x (append version of di). 


wh N xx Install a trap to invoke xx at page position N; a negative N 
will be interpreted with respect to the page bottom. Any 
macro previously planted at .V is replaced by «c A zero 
N refers to the top of a page. In the absence of xx, the 


first found trap at A, if any, is removed. 


‘ 
) 
< 


chix XN - - v Change the trap position for macro xv to be NV. In the 
absence of ‘V, the trap, if any, is removed. 


at N x . off D,v Install a diversion trap at position NV in the current diver- 
sion to invoke macro xx Another dt will redefine the 
diversion trap. If no arguments are given, the diversion 
trap is removed. 


it N xx - off E Set an input-line-count trap to invoke the macro x«x after 
N lines of text input have been read (control or request 
lines don’t count). The text may be in-line text or text 
interpolated by inline or trap-invoked macros. 


em xx none none - The macro xx will be invoked when all input has ended. 
The effect is the same as if the contents of «x had been 
at the end of the last file processed. 


8. Number Registers 


A variety of parameters are available to the user as predefined, named number registers (see Summary 
and Index, page 7). In addition, the user may define his own named registers. Register names are one 
or two characters long and do not conflict with request, macro, or string names. Except for certain 
predefined read-only registers, a number register can be read, written, automatically incremented or 
decremented, and interpolated into the input in a variety of formats. One common use of user-defined 
registers 1S to automatically number sections, paragraphs, lines, etc. A number register may be used 
any lime numerical input is expected or desired and may be used in numerical expressions (§1.4). 


Number registers are created and modified using nr, which specifies the name, numerical value, and 
the auto-increment size. Registers are also modified, if accessed with an auto-incrementing sequence. 
If the registers x and xx both contain MN and have the auto-increment size /, the following access 
sequences have the effect shown: 


Effect on Value 
Sequence Register Interpolated 
none 
none 


x incremented by f 
x decremented by 
xx incremented by M/ 
xx decremented by Vf 


When interpolated, a number register is converted to decimal (default), decimal with leading zeros, 
lawer-case Roman, upper-case Roman, lower-case sequential alphabetic, or upper-case sequential alpha- 
betic according to the format specified by af. 


Request Initial lf No 
Form Value Argument Notes Explanation 
nr R+t+NM - u The number register R is assigned the value +N with 


respect to the previous value, if any. The increment for 
auto-incrementing is set to VW. 
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af Re arabic : - Assign format c to register R. The available formats are: 


0,1,2,3,4,5,... 
000,001 ,002,003 004,005... 
0,i,i1,iii,iv,v,... 

OU ,IV,V.,... 
0,a,b,c,...,2,aa,ab,...,2Z,aaa,... 
0,A,B,C,...,Z, AA,AB,...,ZZ,AAA,... 


An arabic format having N digits specifies a field width of 
N digits (example 2 above). The read-only registers and 
the width function (§11.2) are always arabic. 


wR - ignored - Remove register R. If many registers are being created 
. dynamically, it may: become necessary to remove no 
longer used registers to recapture internal storage space 

for newer registers. 


9. Tabs, Leaders, and Fields 


9.1. Tabs and leaders. The ASCII horizontal tab character and the ASCII SOH (hereafter known as the 
leader character) can both be used to generate either horizontal motion or a string of repeated charac- 
ters. The length of the generated entity is governed by internal tad stops specifiable with ta. The 
default difference is that tabs generate motion and leaders generate a string of periods; te and Ie offer 
the choice of repeated character or motion. There are three types of internal tab stops—/eft adjusting, 
right adjusting, and centering. In the following table: D is the distance from the current position on the 
input line (where a tab or leader was found) to the next tab stop; next-siring consists of the input charac- 
ters following the tab (or leader) up to the next tab (or leader) or end of line; and W is the width of 


next-Siring. 
Tab Length of motion or Location of 
type repeated characters next-string 
D 
W 


Left Following D 
Right Right adjusted within D 
Centered Centered on right end of D 


D- 
D-W/2 


The length of generated motion is allowed to be negative, but that of a repeated character string cannot 
be. Repeated character strings contain an integer number of characters, and any residual distance is 
prepended as motion. Tabs or leaders found after the last tab stop are ignored, but may be used as 
next-string terminators. 


Tabs and leaders are not interpreted in copy mode. \t and \a always generate a non-interpreted tab and 
leader respectively, and are equivalent to actual tabs and leaders in copy mode. 


9.2. Fields. A field is contained between a pair of field delimiter characters, and consists of sub-strings 
separated by padding indicator characters. The field length is the distance on the input line from the 
position where the field begins to the next tab stop. The difference between the total length of ail the 
sub-strings and the field length is incorporated as horizontal padding space that is divided among the 
indicated padding places. The incorporated padding is allowed to be negative. For example, if the field 
delimiter is # and the padding indicator is “, #° «cx right# specifies a right-adjusted string with the 
string x» centered in the remaining space. 
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Request Initial If No 

Form Value Argument Notes Explanation . 

ta NV... 0.8; 0.5in none E,m Set tab stops and types. :=R, right adjusting; r=C, 
centering; ¢ absent, lef. adjusting. TROFF tab stops are 
preset every 0.5in.; NROFF every 0.8in. The stop values 
are separated by spaces, and a value preceded by + is 
treated as an increment to the previous stop value. 

tec none none E The tab repetition character becomes c, or is removed 
specifying motion. 

lec ; . none E The leader repetition character becomes c, or is removed 

specifying motion. 
fea b off off - The field delimiter is set to a, the padding indicator is set 


to the space character or to 5, if given. In the absence of 
arguments the field mechanism is turned off. 


10. Input and Output Conventions and Character Translations 


10.1. Input character translations. Ways of inputting the graphic character set were discussed in 92.1. 
The ASCII control characters horizontal tab ($9.1), SOH (89.1), and backspace ($10.3) are discussed 
elsewhere. The newline delimits input lines. In addition, STX, ETX, ENQ, ACK, and BEL are accepted, 
and may be used as delimiters or translated into a graphic with tr (10.5). Ail others are ignored. 


The escape character \ introduces escape sequences—causes the following character to mean another 
character, or to indicate some function. A complete list of such sequences is given in the Summary 
and Index on page 6. \ should not be confused with the ASCII control character ESC of the same name. 
The escape character \ can be input with the sequence \\. The escape character can be changed with 
ec, and all that has been said about the default \ becomes true for the new escape character. \e can be 
used to print whatever the current escape character is. If necessary or convenient, the escape mechan- 
ism may be turned off with eo, and restored with ec. 


Request Initial If No 


Form Value Argument Notes Expianation 
eC C \ \ - Set escape character to \, or to c, if given. 
20 on - - Turn escape mechanism off. 


10.2. Ligatures. Five ligatures are available in the current TROFF character set — fi, fi, ff, ffi, and ff. 
They may be input (even in NROFF) by \(fi, \(fl, \(ff, \(Fi, and \(F1 respectively. The ligature mode 
is normally on in TROFF, and automatically invokes ligatures during input. 


Request Initial lf No 
Form Value Argument Notes Explanation 


jig N off; on on : Ligature mode is turned on if N is absent or non-zero, 
and turned off if Ne0. If Mm2, only the two-character 
ligatures are automatically invoked. Ligature mode is 
inhibited for request, macro, string, register, or file 
names, and in copy mode. No effect in NROFF. 


10.3. Backspacing, underlining, overstriking, etc. Unless in copy mode, the ASCII backspace character is 
replaced by a backward horizontal motion having the width of the space character. Underlining as a 
form of line-drawing is discussed in §12.4. A generalized overstriking function is described in $12.1. 


NROFF automatically underlines characters in the underline font, specifiable with uf, normally that on 
font position 2 (normally Times Italic, see §2.2). In addition to ft and \fF, the underline font may be 
selected by ul and cu. Underlining is restricted to an output-device-dependent subset of reasonable 
characters. 
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Request Initial lf No 
Form Value Argument Notes Explanation 


ul NV off Nw} E Underline in NROFF (italicize in TROFF) the next NV 
input text lines. Actually, switch to underline font, saving 
the current font for later restoration; other font changes 
within the span of a ul will take effect, but the restora- 
tion will undo the last change. Output generated by tl 
(§14) is affected by the font change, but does zor decre- 
ment NV. If N>1, there is the risk that a trap interpo- 
lated macro may provide text lines within the span; 
environment switching can prevent this. 


cu NV off N= E A variant of ul that causes every character to be under- 
lined in NROFF. Identical to ul in TROFF. 
uf F Italic Italic - Underline font set to -. In NROFF, F may not be on 


position | (initially Times Roman). 


10.4. Control characters. Both the control character . and the no-dreak control character © may be 
changed, if desired. Such a change must be compatible with the design of any macros used in the span 
of the change, and particularly of any trap-invoked macros. 


? 


Request Initial [f No ; 

Form Value Argument Notes Explanation 

ce c ; ; E The basic control character is set to c, or reset to ".". 
c2C ° : E The nobreak control character is set to c, or reset to ™”. 


10.5. Output translation. One character can be made a stand-in for another character using tr. All text 
processing (e. g. character comparisons) takes place with the input (stand-in) character which appears to 
have the width of the final character. The graphic transiation occurs at the moment of output (includ- 
ing diversion). 


Request Initial [f No 
Form Value Argument Notes Explanation 
.tr aécd.... none - QO Translate a into 6, c into d@, etc. If an odd number of 


characters is given, the last one will be mapped into the 
space character. To be consistent, a particular translation 
must stay in effect from input to output time. 


10.6. Transparent throughput. An input line beginning with a \! is read in copy mode and transparently 
output (without the initial \!); the text processor is otherwise unaware of the line’s presence. This 
mechanism may be used to pass control information to a post-processor or to imbed control lines in a 
macro created by a diversion. 


10.7. Comments and concealed newlines. An uncomfortably long input line that must stay one line (e. g. 
a string definition, or nofilled text) can be split into many physical lines by ending all but the last one 
with the escape \. The sequence \(newline) is a/ways ignored—except in a comment. Comments may 
be imbedded at the end of any line by prefacing them with \*. The newline at the end of a comment 
cannot be concealed. A line beginning with \" will appear as a blank line and behave like .sp 1; a com- 
ment can be on a line by itself by beginning the line with .\". 


11. Local Horizontal! and Vertical Motions, and the Width Function 


11.1. Local Motions. The functions \v’ N’ and \h’ NV’ can be used for /oca/ vertical and horizontal motion 
respectively. The distance N may be negative, the positive directions are rightward and downward. A 
local motion is one contained within a line. To avoid unexpected vertical dislocations, it is necessary 
that the ner vertical local motion within a word in filled text and otherwise within a line balance to zero. 
The above and certain other escape sequences providing local motion are summarized in the following 
table. 
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Vertical Effect in Horizontal Effect in 
Local Motion TROFF NROFF Local Motion TROFF NROFF 


\w'N’ Move distance NV \h’N’ Mc ’e distance NV 
aaa Unpaddable space-size space 
Digit-size space 


\| 1/6 em space | ignored 
. 1/12 em space | ignored 


As an example, E? could be generated by the sequence E\s—2\v'—0.4m’2\v'0.4m‘\s+2; it should be 
noted in this example that the 0.4 em vertical motions are at the smaller size. 


11.2, Width Function. The width function \w'string’ generates the numerical width of string (in basic 
units). Size and font changes may be safely imbedded in string, and will not affect the current environ- 
ment. For example, .ti ~\w’1. “u could be used to temporarily indent leftward a distance equal to the 
size of the string "1. ”. 


The width function also sets three number registers. The registers st and sb are set respectively to the 
highest and lowest extent of srring relative to the baseline; then, for example, the total Ae:ghr of the 
string is \n(stu—\n(sbu. In TROFF the number register ct is set to a value between 0 and 3: 0 means 
that all of the characters in szring were short lower case characters without descenders (like e); 1 means 
that at least one character has a descender (like y). 2 means that at least one character is tall (like H); 
and 3 means that both tall characters and characters with descenders are present. 


11.3. Mark horizontal place. The escape sequence \kx will cause the current horizontal position in the 
input line to be stored in register x As an example, the construction \kxword\h’|\nxu+2uw’ word will 
embolden word by backing up to almost its beginning and overprinting it, resulting in word 


12. Overstrike, Bracket, Line-drawing, and Zero-width Functions 


12.1. Overstriking. Automatically centered overstriking of up to nine characters is provided by the over- 
strike function \o’ string’. The characters in srring overprinted with centers aligned; the total width is 
that of the widest character. string should nor contain local vertical motion. As examples, \o’e\”" pro- 
duces é, and \o’\(mo\(si’ produces ¢. 


12.2. Zero-width characters. The function \zc will output c without spacing over it, and can be used to 
produce left-aligned overstruck combinations. As examples, \z\(ci\(pl will produce ©, and 
\(br\z\(rn\(ul\ (br will produce the smallest possible constructed box [] 


12.3. Large Brackets. The Special Mathematical Font contains a number of bracket construction pieces 
C£U) 3d $P, PP] ) that can be combined into various bracket styles. The function \b’ string’ may be used 
to pile up vertically the characters in string (the first character on top and the last at the bottom); the 
characters are vertically separated by | em and the total pile is centered 1/2 em above the current base- 


line (2 line in NROFF). For example, \b’ \(Ic\ (If “E\|\b’ \(re\ (rf *\x° —0.5m‘\x’0.5m’ produces le] 


1 line up 
% line down 
1 line up 


’a em up 
’a em down 
1 em up 


12.4. Line drawing. The function \1’ Nc’ will draw a string of repeated c’s towards the right for a dis- 
tance NV. (\l is \(lower case L). If c looks like a continuation of an expression for N, it may insulated 
from N with a \&. If cis not specified, the — (baseline rule) is used (underline character in NROFF). If 
N is negative, a backward horizontal motion of size N is made defore drawing the string. Any space 
resulting from N/(size of c) having a remainder is put at the beginning (left end) of the string. In the 
case of characters that are designed to be connected such as baseline-rule _, underrule _, and root- 
en , the remainder space is covered by over-lapping. If Nis /ess than the width of c, a single cis cen- 
tered on a distance NV. As an example, a macro to underscore a string can be written 


.de us 


\\$1\ 1° |0\(ul’ 


hee 
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or one to draw a box around a string 
.de bx 
\Cor\|\\S1\ [\ (be\ 1°] 0\ (rn 1° [0\ Cul’ 


such that 

.ul “underlined words" 
and 

.bx “words in a bex” 


yield underlined words and 


The function \L’ Ne’ will draw a vertical line consisting of the (optional) character c stacked vertically 
apart |em (1 line in NROFF), with the first two characters overlapped, if necessary, to form a continu- 
ous line. The default character is the box rule | (\(br); the other suitable character is the bold vertical | 
(\(bv). The line is begun without any initial motion relative to the current base line. A positive NV 
specifies a line drawn downward and a negative N specifies a line drawn upward. After the line is drawn 
no compensating motions are made; the instantaneous baseline is at the end of the line. 


The horizontal and vertical line drawing functions may be used in combination to produce large boxes. 
The zero-width box-ru/e and the '2-em wide underrule were designed to form corners when using 1-em 
vertical spacings. For example the macro 


.de eb 
sp —1 \"compensate for next automatic base-line spacing 
nf \"avoid possibly overflowing word buffer 


\h’—.5n’\L’ |\\nau— 1'\I'\\nG lu + in\ (al \L’ — |\\nau+1'\l |Ou—.Sn\(ul’ —\"draw box 


ee 


will draw a box around some text whose beginning vertical place was saved in number register a (e. g. 
using .mk a) as done for this paragraph. 


13. Hyphenation. 


The automatic hyphenation may be switched off and on. When switched on with hy, several variants 
may be set. A Ayphenation indicator character may be imbedded in a word to specify desired hyphena- 
tion points, or may be prepended to suppress hyphenation. In addition, the user may specify a small 
exception word list. 


Only words that consist of a central alphabetic string surrounded by (usually null) non-alphabetic 
strings are considered candidates for automatic hyphenation. Words that were input containing hyphens 
(minus), em-dashes (\(em), or hyphenation indicator characters—such as mother-in-law—are aiways 
subject to splitting after those characters, whether or not automatic hyphenation is on or off. 


Request Initial [f No 

Form Value Argument Notes Explanation 

enh hyphenate - E Automatic hyphenation is turned off. 

why NV on,V=1 on,V=!1 E Automatic hyphenation is turned on for N21, or off for 


N=0. If N=2, last lines (ones that will cause a trap) 
are not hyphenated. For N=4 and 8, the last and first 
two characters respectively of a word are not split off. 
These values are additive, i.e. N=14 will invoke all 
three restrictions. 


whe c \% \% E Hyphenation indicator character is set to c or to the 
default \%. The indicator does not appear in the output. 


«hw word! ... ignored - Specify hyphenation points in words with imbedded 
minus signs. Versions of a word with terminal s are 
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implied; i.e. dig—it implies dig—its. This list is exam- 
ined initially and after each suffix stripping. The space 
available is smail—about 123 characters. 


14. Three Part Titles. 


The titling function tl provides for automatic placement of three fields at the left, center, and right of a 
line with a title-length specifiable with It. tl may be used anywhere, and is independent of the normal 
text collecting process. A common use is in header and footer macros. 


Request Initial lf No 
Form Value Argument Notes Explanation 


tl ‘left’ center’ right’ - - The strings left, center, and right are respectively left- 
adjusted, centered, and right-adjusted in the current 
title-length. Any of the strings may be empty, and over- 
lapping is permitted. If the page-number character (ini- 
tially %) is found within any of the fields it is replaced by 
the current page number having the format assigned to 
register %. Any character may be used as the string del- 
imiter. 

pec % off - The page number character is set to c, or removed. The 
page-number register remains %. 


jt tN 6.5 in previous E,m Length of title set to +N. The line-length and the title- 
length are independent. Indents do not apply to titles: 
page-offsets do. 


15. Output Line Numbering. 


Automatic sequence numbering of output lines may be requested with nm. When in effect, a 

three-digit, arabic number plus a digit-space is prepended to output text lines. The text lines are 
3 thus offset by four digit-spaces, and otherwise retain their line length; a reduction in line length 

may be desired to keep the right margin aligned with an earlier margin. Blank lines, other vertical 

spaces, and lines generated by tl are not numbered. Numbering can be temporarily suspended with 
6 nn, or with an .nm followed by a later .nm +0. In addition, a line number indent /, and the 

number-text separation S may be specified in digit-spaces. Further, it can be specified that only 

those line numbers that are multiples of some number M are to be printed (the others will appear 
9 as blank number fields). 


Request Initial If No 
Form Value Argument Notes Explanation 
nmtNMS!/ off E Line number mode. If +/N is given, line numbering is 


turned on, and the next output line numbered ts num- 
bered +N. Default values are W=1, S=1, and /=0. 
Parameters corresponding to missing arguments are 
unaffected; a non-numeric argument is considered miss- 
ing. In the absence of all arguments, numbering is 
turned off; the next line number is preserved for possible 
further use in number register In. 


enn NV - N=] E The next N text output lines are not numbered. 


As an example, the paragraph portions of this section are numbered with M=3: .nm13 was 

placed at the beginning; .nm was placed at the end of the first paragraph; and .nm +0 was placed 
12 in front of this paragraph; and .nm finally placed at the end. Line lengths were also changed (by 

\w'0000'u) to keep the right side aligned. Another example is .nm +5 5 x3 which turns on 

numbering with the line number of the next line to be 5 greater than the last numbered line, with 
15 M=5, with spacing S untouched, and with the indent /set to 3. 
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16. Conditional Acceptance of Input 


In the following, c is a one-character, built-in condition name, ! signifies not, N is a numerical expres- 
sion, stringi and string2 are strings delimited by any non-blank, non-numeric character not in the 
strings, and anything represents what is conditionally accepted. 


Request Initial If No 

Form Value Argument Notes Explanation 

if c anything * - If condition c true, accept anything as input; in multi-line 
: case use \{anything\}. 

wif !c anything - - If condition c false, accept anything. 

if N anything - u If expression N > 0, accept anything. 

if !N anything - u If expression N < 0, accept anything. 

Af ‘stringl‘ string2’ anything - If string! identical to string2, accept anything. 

wif ! string] string2° anything : If string! not identical to string2, accept anything. 

fe c anything - u If portion of if-else; all above forms (like if). 

el anything - ° Else portion of if-else. 


The built-in condition names are: 


Condition | 
Name True If 


Current page number is odd 

Current page number is even 
Formatter is TROFF 
Formatter is NROFF 


If the condition cis true, or if the number NV is greater than zero, or if the strings compare identically 
(including motions and character size and font), anything is accepted as input. If a ! precedes the condi- 
tion, number, or string comparison, the sense of the acceptance is reversed. 


Any spaces between the condition and the beginning of anything are skipped over. The anything can be 
either a single input line (text, macro, or whatever) or a number of input lines. In the multi-line case, 
the first line must begin with a left delimiter \{ and the last line must end with a right delimiter \}. 


The request le (if-else) is identical to if except that the acceptance state is remembered. A subsequent 
and matching el (else) request then uses the reverse sense of that state. ie - el pairs may be nested. 
Some examples are: 

.ife .tl “Even Page %"”” 
which outputs a title if the page number is even; and 


ie \n%>1 \{\ 
“sp 0.51 

tl “Page %°”’ 
*sp {1.21 \} 

.el .sp |2.5i 


which treats page | differently from other pages. 
17. Environment Switching. 


A number of the parameters that control the text processing are gathered together into an environment, 
which can be switched by the user. The environment parameters are those associated with requests 
noting E in their Notes column; in addition, partially collected lines and words are in the environment. 
Everything else is global; examples are page-oriented parameters, diversion-oriented parameters, 
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number registers, and macro and string definitions. All environments are initialized with default 
parameter values. 


Request Initial lf No 
Form Value Argument Notes Explanation 
ey N N=x() previous - Environment switched to environment 0< NV<2. Switch- 


ing is done in push-down fashion so that restoring a pre- 
vious environment must be done with .ev rather than 
specific reference. 


18. Insertions from the Standard Input 


The input can be temporarily switched to the system standard input with rd, which will switch back 
when two newlines in a row are found (the extra blank line is not used). This mechanism is intended 
for insertions in form-letter-like documentation. On UNIX, the standard input can be the user’s key- 
board, a pipe, or a file. 


Request Initial lf No 
Form Value Argument Notes Explanation 
rd prompt - prompt=BEL - Read insertion from the standard input until two new- 


lines in a row are found. If the standard input is the 
user’s keyboard, prompt (or a BEL) is written onto the 
user’s terminal. rd behaves like a macro, and arguments 
may be placed after prompt. 


eX - - - Exit from NROFF/TROFF. Text processing is terminated 
exactly as if all input had ended. 


If insertions are to be taken from the terminal keyboard while output is being printed on the terminal, 
the command line option —q will turn off the echoing of keyboard input and prompt only with BEL. 
The regular input and insertion input cannot simultaneously come from the standard input. 


As an example, multiple copies of a form letter may be prepared by entering the insertions for all the 
copies in one file to be used as the standard input, and causing the file containing the letter to reinvoke 
itself using nx (§19); the process would ultimately be ended by an ex in the insertion file. 


19. Input/Output File Switching 


Request Initial lf No 

Form Value Argument Notes Expianation 

.so filename - - Switch source file. The top input (file reading) level is 
switched to filename. The effect of an so encountered in 
a macro is not felt until the input level returns to the file 
level. When the new file ends, input is again taken from 
the original file. so’s may be nested. 

.nx Alename end-of-file - Next file is Alename. The current file is considered 
ended, and the input is immediately switched to filename. 

pi program : - Pipe output to program (NROFF only). This request 


must occur defore any printing occurs. No arguments are 
transmitted to program. 


20. Miscellaneous 


Request Initial If No 
Form Value Argument Notes Explanation 
mec NV - off E,m Specifies that a margin character c appear a distance NV to 


the right of the right margin after each non-empty text 
line (except those produced by ti). If the output line is 
too-long (as can happen in nofill mode) the character will 
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be appended to the line. If N is not given, the previous 
N is used; the initial NV is 0.2 inches in NROFF and lem 
in TROFF. The margin character used with this para- 
graph was a |2-point box-rule. 


.tm string - newline - After skipping initial blanks, string (rest of the line) is 
read in copy mode and written on the user’s terminal. 


.ig yy - yy ™.. . Ignore input lines. ig behaves exactly like de (§7) except 
that the input is discarded. The input is read in copy 
mode, and any auto-incremented registers will be 
affected. 


pm / - all - Print macros. The names and sizes of all of the defined 
macros and strings are printed on the user’s terminal; if ¢ 
is given, only the total of the sizes is printed. The sizes 
is given in dlocks of 128 characters. 


fl os - B Fiush output buffer. Used in interactive debugging to 
force output. 


21. Output and Error Messages. 


The output from tm, pm, and the prompt from rd, as well as various error messages are written onto 
UNIX’s standard message output. The latter is different from the standard ourput, where NROFF format- 
ted output goes. By default, both are written onto the user’s terminal, but they can be independently 
redirected. 


Various error conditions may occur during the operation of NROFF and TROFF. Certain less serious 
errors having only local impact do not cause processing to terminate. Two examples are word overflow, 
caused by a word that is too large to fit into the word buffer (in fill mode), and /ine overflow, caused by 
an output line that grew too large to fit in the line buffer, in both cases, a message is printed, the 
offending excess is discarded, and the affected word or line is marked at the point of truncation with a « 
in NROFF and a “in TROFF. The philosophy is to continue processing, if possible, on the grounds 
that output useful for debugging may be produced. If a serious error occurs, processing terminates, and 
an appropriate message is printed. Examples are the inability to create, read, or write files, and the 
exceeding of certain internal limits that make future output unlikely to be useful. 
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TUTORIAL EXAMPLFS 


Tl. Introduction 


Although NROFF and TROFF have by design a 
syntax reminiscent of earlier text processors* 
with the intent of easing their use, it is almost 
always necessary ’to prepare at least a small set of 
macro definitions to describe most documents. 
Such common formatting needs as page margins 
and footnotes are deliberately not built into 
NROFF and TROFF. Instead, the macro and 
string definition, number register, diversion, 
environment switching, page-position trap, and 
conditional input mechanisms provide the basis 
for user-defined implementations. 


The examples to be discussed are intended to be 
useful and somewhat realistic, but won’t neces- 
sarily cover all relevant contingencies. Explicit 
numerical parameters are used in the examples to 
make them easier to read and to illustrate typical 
values. In many cases, number registers would 
really be used to reduce the number of places 
where numerical information is kept, and to con- 
centrate conditional parameter initialization like 
that which depends on whether TROFF or NROFF 
is being used. 


T2. Page Margins 


As discussed in §3, header and footer macros are 
usually defined to describe the top and bottom 
page margin areas respectively. A trap is planted 
at page position 0 for the header, and at —N (N 
from the page bottom) for the footer. The sim- 
plest such definitions might be 


.de hd \"define header 
‘sp li 

i \"end definition 
.de fo \"define footer 
‘bp 

i \"end definition 
.wh 0 hd 

.wh —1i fo 


which provide blank | inch top and bottom mar- 
gins. The header will occur on the /first page, 
only if the definition and trap exist prior to the 


*For example: P. A. Crisman, Ed., The Compatible Time- 
Sharing System, MIT Press, 1965, Section AH9.01 (Descrip- 
tion of RUNOFF program on MIT’s CTSS system). 


initial pseudo-page transition (§3). In fill mode, 
the output line that springs the footer trap was 
typically forced out because some part or whole 
word didn’t fit on it. If anything in the footer 
and header that follows causes a break, that word 
or part word will be forced out. In this and other 
examples, requests like bp and sp that normally 
cause breaks are invoked using the no-dbreak con- 
trol character ° to avoid this. When the 
header/footer design contains material requiring 
independent text processing, the environment 
may be switched, avoiding most interaction with 
the running text. 


A more realistic example would be 


.de hd \"header 

Aft tl’\Gn’\(@rn’ \"troff cut mark 
.if \\n%>1 \{\ 

\"tl base at 0.5i 
\"centered page number 


.ps \"restore size 

ft \"restore font 

vs \} \"restore vs 

‘sp |1.0i \"space to 1.01 

ns \"turn ou no-space mode 
.de fo \"footer 

ps 10 \"set footer/header size 
ft R \"set font 


.vs 12p \"set base-line spacing 
if \\n%=1 \{\ 

*sp |\\nCpu—0.5i—-1 \"tl base 0.5i up 
th’ — % —”’ \} \"first page number 


“bp 


.wh 0 hd 
.wh —li fo 


which sets the size, font, and base-line spacing 
for the header/footer material, and ultimately 
restores them. The material in this case is a page 


lumber at the bottom of the first page and at the 


top of the remaining pages. If TROFF is used, a 
cut mark is drawn in the form of root-en’s at each 
margin. The sp’s refer to absolute positions to 
avoid dependence on the base-line spacing. 
Another reason for this in the footer is that the 
footer is invoked by printing a line whose vertical 
spacing swept past the trap position by possibly as 
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much as the base-line spacing. The mo-space 
mode is turned on at the end of hd to render 
ineffective accidental occurrences of sp at the top 
of the running text. 


The above method of restoring size, font, etc. 
presupposes that such requests (that set previous 
value) are not used in the running text. A better 
scheme is save and restore both the current and 
previous values as shown for size in the follow- 
ing: 


.de fo 

enr si \\n(.s  \"current size 

.ps 

nr s2\\n(.s \"previous size 

— \"rest of footer 

.de hd 

. \"header stuff — 

.ps \\n(s2 \“restore previous size 
-ps \\n(s1 \"restore current size 


oe 


Page numbers may be printed in the bottom mar- 
gin by a separate macro triggered during the 
footer’s page ejection: 


.de bn \"bottom number 
th’—% —” \"centered page number 


wh —0.5i—1v bn \"t! base 0.5i up 


T3. Paragraphs and Headings 


The housekeeping associated with starting a new 
paragraph should be collected in a paragraph 
macro that, for example, does the desired 
preparagraph spacing, forces the correct font, 
size, base-line spacing, and indent, checks that 
enough space remains for more than one line, and 
requests a temporary indent. 


.de pg \"paragraph 
br \“break 

ft R \"force font, 
ps 10 \"size, 

vs 12p \"spacing, 
.in 0 \"and indent 
sp 0.4 \"prespace 


ene 1+\\n(.Vu \"want more than 1 line 
.ti 0.21 \"temp indent 


The first break in pg will force out any previous 
partial lines, and must occur before the vs. The 
forcing of font, etc. is partly a defense against 
prior error and partly to permit things like sec- 
tion heading macros to set parameters only once. 


The prespacing parameter is suitable for TROFF; 
a larger space, at least as big as the output device 
vertical resolution, would be more suitable in 
NROFF. The choice of remaining space to test 
for in the ne is the smallest amount greater than 
one line (the .V is the available vertical resolu- 
tion). 


A macro to automatically number section head- 
ings might look like: 


.de sc \"section 
— \"force font, ete. 
sp 0.4 \"prespace 


ene 2.4+\\n(.Vu \"want 2.4+ lines 
fi 
\\n+S. 


mrS01 \"init S 


The usage is .se, followed by the section heading 
text, followed by .pg. The ne test value includes 
one line of heading, 0.4 line in the following pg, 
and one line of the paragraph text. A word con- 
Sisting of the next section number and a period is 
produced to begin the heading line. The format 
of the number may be set by af (§8). 


Another common form is the labeled, indented 
paragraph, where the label protrudes left into the 
indent space. 


.de lp \"labeled paragraph 
Pe 

in 0.5i \"paragraph indent 
ta 0.210.5i \"label, paragraph 

.t1 0 

\t\\Si1\t\c \"flow into paragraph 


The intended usage is ".Ip label"; labe/ will begin 
at Q.2inch, and cannot exceed a length of 
0.3inch without intruding into the paragraph. 
The label could be right adjusted against 0.4 inch 
by setting the tabs instead with .ta 0.4IR 0.51. 
The last line of Ip ends with \c so that it will 
become a part of the first line of the text that fol- 
lows. 


T4. Multiple Column Output 


The production of multiple column pages 
requires the footer macro to decide whether it 
was invoked by other than the last column, so 
that it will begin a new column rather than pro- 
duce the bottom margin. The header can initial- 
ize a column register that the footer will incre- 
ment and test. The following is arranged for two 
columns, but is easily modified for more. 
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de hd \"header 

orci 01 \"init column count 
.mk \"mark top of text 

.de fo \"footer 

de \\n+ (ci <2 \[\ 

.po +3.4j \"next column; 3.1+90.3 


rt \"back to mark 


.ns \} \"no-space mode 
el\(\ 

.po \\nMu __s\“restore left margin 
“pp \} 

3.1 \"column width 

nr M \\n(.o0_ \“save left margin 


Typically a portion of the top of the first page 
contains full width text, the request for the nar- 
rower line length, as weil as another .mk would 
be made where the two column output was to 
begin. 

TS. Footnote Processing 


The footnote mechanism to be described is used 
by imbedding the footnotes in the input text at 
the point of reference, demarcated by an initial 
.fn and a terminal .ef: 


fn 
Footnote text and control lines... 
ef 


In the following, footnotes are processed in a 
' separate environment and diverted for later 
printing in the space immediately prior to the 
bottom margin. There is provision for the case 
where the last collected footnote doesn’t com- 
pletely fit in the available space. 


de hd \"header 

-nrx01 \"init footnote count 
.nr yQ—\\nb \"current footer place 
.ch fo —\\nbu \"reset footer trap 

if \\n(dn .fz \“leftover footnote 


.de fo \"footer 

wor dn 0 \"zero last diversion size 
-if \\nx \{\ 

ev 1 \"expand footnotes in evl 
of \"retain vertical size 

FN \"footnotes 

rm FN \"delete it 


if “\\n(.z"fy” .di \"end overflow diversion 
wor x 0 \"disable fx 


ev \} \"pop environment 

bp 

de fx \"process footnote overflow 
.if \\nx .di fy \*divert overflow 

.de fn \"start footnote 

da FN \"divert (append) footnote 
ev 1 \"in environment 1 

if \\n+x=1 .fs \“if first, include separator 
fi \"fill mode 

.de ef \"end footnote 


-br \"finish output 


orz\\n(.v — \“save spacing 
ey \"pop ev 
.di \"end diversion 


nr y ~\\n(dn \"new footer position, 

.if \\nx=1 .nr y ~(\\n(.v—\\nz) \ 
\"uncertainty correction 

ch fo\\nyu = \"y is negative 

if (\\n(nil-+ ty) >(\\nCp+\\ny) \ 

ch fo \\n(nlu+lyv \"it didn’t fit 


‘de fs \"separator 

\V 17° \"1 inch rule 

br 

de fz \"get leftover footnote 
fn 

nf \"retain vertical size 
fy \"where fx put it 

.ef 


or b 1.0i \"bottom margin size 

.wh 0 hd \"header trap 

.wh 12i fo \"footer trap, temp position 
.wh —\\nbu fx \"fx at footer position 


.ch fo —\\nbu \"conceal fx with fo 


The header hd initializes a footnote count regis- 
ter x, and sets both the current footer trap posi- 
tion register y and the footer trap itself to a nom- 
inal position specified in register b. In addition, 
if the register dn indicates a leftover footnote, fz 
is invoked to reprocess it. The footnote start 
macro fn begins a diversion (append) in environ- 
ment 1, and increments the count x; if the count 
is one, the footnote separator fs is interpolated. 
The separator is kept in a separate macro to per- 
mit user redefinition. The footnote end macro ef 
restores the previous environment and ends the 
diversion after saving the spacing size in register 
z. y is then decremented by the size of the 
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footnote, available in dn; then on the first foot- 
note, y is further decremented by the difference 
in vertical base-line spacings of the two environ- 
ments, to prevent the late triggering the footer 
.trap from causing the last line of the combined 
footnotes to overflow. The footer trap is then set 
to the lower (on the page) of y or the current 
page position (nl) plus one line, to allow for 
printing the reference line. If indicated by x, the 
footer fo rereads the footnotes from FN in nofill 
mode in environment |, and deletes FN. If the 
footnotes were too large to fit, the macro fx will 
be trap-invoked to redivert the overflow into fy, 
and the register dn will later indicate to the 
header whether fy is empty. Both fo and fx are 
planted in the nominal footer trap position in an 
order that causes fx to be concealed unless the fo 
trap is moved. The footer then terminates the 
overflow diversion, if necessary, and zeros x to 
disable fx, because the uncertainty correction 
together with a not-too-late triggering of the 
footer can result in the footnote rereading finish- 
ing before reaching the fx trap. 


A good exercise for the student is to combine 
the multiple-column and footnote mechanisms. 


T6. The Last Page 


After the last input file has ended, NROFF and 
TROFF invoke the end macro (§7), if any, and 
when it finishes, eject the remainder of the page. 
During the eject, any traps encountered are pro- 
cessed normally. At the end of this last page, 
processing terminates uniess a partial line, word, 
Or partial word remains. If it is desired that 
another page be started, the end-macro 


.de en \"end-macro 
\c 
“bp 


em en 


will deposit a nuil partial word, and effect 
another last page. 
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Table I 


Font Style Examples 


The following fonts are printed in 12-point, with a vertical spacing of 14-point, and with non- 
alphanumeric characters separated by sem space. The Special Mathematical Font was specially 
prepared for Bell Laboratories by Graphic Systems, Inc. of Hudson, New Hampshire. The Times 
Roman, Italic, and Bold are among the many standard font available from that company. 


Times Roman 


abcdefghijkimnoparstuvwxyz 
ABCDEFGHIUKLMNOPQRSTUVWXYZ 


1234567890 
I$H%&()S°*+—.,/:;=27[] | 
eo—-_“AANXAFTAM’T'¢9°® 
Times Italic 

abcdefghijkimnopgrstuywxyz 
ABCDEFGHIJKLMNOPORSTUVWXYZ 
1234567890 


SH& OV *+H—.,/:;= 71] 
eo —-_h4huffgyigf°r'e¢ 


Times Bold 


abcdefg hijklmnoparstuvwxyz 
ABCDEFGHIJKLMNOPORSTUVWAYZ 
1234567890 

SHE QF +—.,/:5;= 71] 

emn—- “%ANARATRA’T'¢?° 


Special Mathematical Font 


"\" “"/<>{}#@+—-=- 
ahydelndixruviompastydxvye 
TA@AEBNLYOvVO 

J >< B~s=H—--f]x+tUNc3dC 2-9 
SV fae € term @ lO Lt 
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Table IT 


Input Naming Conventions for’, ,and — 
and for Non-ASCII Special Characters 


Non-ASCII characters and minus on the standard fonts. 


Input Character Input Character 
Char Name Name Char Name Name 
_ close quote \(i fi 


open quote \(fl fl 
— \(em 3/4 Em dash 
-_ = hyphen or 

\(hy hyphen 

\—~ current font minus 


e \(bu bullet 
Oo 


o@n --—+ o ER ER SR mm 
= 
a. 
=) 


\(dg dagger 
\(sq. square \(fm foot mark 
_ \(ru_ rule \(ct scent sign 
% \(l4 1/4 \(rg_ ss registered 
“% \(12 1/2 \(co copyright 


% \(34 3/4 


Non-ASCIl characters and’, *, _, +, —, =, and « on the special font. 


The ASCII characters @, #,",’, ‘, <, >, \, (, }, 7, *, and _ exist onfy on the special font and are 
printed as a l-em space if that font is not mounted. The following characters exist only on the special 
font except for the upper case Greek letter names followed by ft which are mapped into upper case 
English letters in whatever font is mounted on font position one (default Times Roman). The special 
math plus, minus, and equals are provided to insulate the appearance of equations from the choice of 
standard fonts. 


Input Character Input CaAaracter 

Char Name Name Char Name Name 

+ \(pl math plus x \(*k kappa 

— \(mi math minus d \( lambda 

= \(eq math equals »p \Cm mu 

+ \(** math star yp \(*n nu 

§ \(sc section E \(%c xi 

’ \(aa acute accent o \({*o omicron 

\(ga_ grave accent aw \(*p pi 

_ \(ul — underrule p \(*r rho 

/ \(si © slash (matching backslash) o0 \(*s_ sigma 

a \(*a alpha s \(ts terminal sigma 

B \(*b__ beta vr \(*t tau 

y \(g gamma v \(*u_ upsilon 

5 \(*d delta @ \(*f phi 

e \(*e epsilon x \C@x chi 

¢ \(*z zeta w \Cq psi 

n \(*y eta w \(*w omega 

6 \(*h theta A \(*A_ Alphat 

e \Ci iota B \(*B_ Betat 
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Input 


Character 


Char Name Name 


KD €ExMOKHMNVOOMASL Se AK—-ODENMpP 3 


MORAL TBMBUNUNDCH+tx—-— PT] RE PR MAY 


\CG 
\CD 
\CE 
\CZ 
\CY 
\CH 
\CI 

\CK 
\GL 


Gamma 
Delta 
Epsilonf 
Zetaf 


square root 

root en extender 
> = 

<= 

identically equal 
approx = 
approximates 
not equal 

right arrow 

left arrow 

up arrow 

down arrow 
multiply 

divide 
plus-minus 

cup (union) 

cap (intersection) 
subset of 
superset of 
improper subset 
improper superset 
infinity 

partial derivative 
gradient 

not 

integral sign 
proportional to 
empty set 
member of 


a a= 0-84 Fo 


sage 


Input Character 
Char Name Name 


box vertical rule 

double dagger 

right hand 

left hand 

Bell System logo 

or 

circle 

left top of big curly bracket 
left bottom 

right top 

right bot : 
left center of big curly bracket 
right center of big curly bracket 
bold vertical 

left floor (left bottom of big 
square bracket) 

right floor (right bottom) 

left ceiling (left top) 

right ceiling (right top) 


Options 


—h 


2 


Oid Requests 


.ad c 


SO name 


New Request 


.ab text 
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(NROFF only) Use output tabs during horizontal spacing to speed up output as well as 
to reduce output byte count. Device tab settings are assumed to be every 8 nominal 
character widths. The default settings of input (logical) tabs is also initialized to every 
8 nominal character widths. 


Efficiently suppresses formatted output. Only message output will occur (from tm re- 
quests and diagnostics). 


The adjustment type indicator c may now also be a number obtained from the ‘‘.j’’ re- 
gister (see below). 


The contents of file name will be interpolated at the point the so request is encoun- 
tered. Previously, the interpolation was done upon return to the file-reading input lev- 
el. 


Prints text on the message output and terminates without further processing. If texr is 
missing, ‘‘User Abort.’’ is printed. Does mot cause a break. The output buffer is 
flushed. 


New Predefined Number Registers 


.K 


C. 


Read-only. Contains the horizontal size of the text portion (without indent) of the 
current partially-collected output line, if any, in the current environment. 


Read-only. Indicates the current adjustment mode and type. Can be saved and later 
given to the ad request to restore a previous mode. 


Read-only. Contains the value | if the current page is being printed, and is zero other- 
wise, i.¢., if the current page did not appear in the —o option list. 


Read-only. Contains the current line-spacing parameter (the value of the most recent 
Is request). 


Provides general register access to the input line-number in the current input file. Con- 
tains the same value as the read-only ‘‘.c’’ register. 
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1. INTRODUCTION 
1.1 Purpose 


This memorandum is the user's guide and reference manual for PwB/MM (or just -mm), a general- 
purpose package of text formatting macros for use with the UNIX* text formatters nroff [9] and srof (9). 
The purpose of PWB/MM Is to provide to the users of PW8/UNIX a unified. consistent, and flexible tool 
for producing many common types of documents. Although PWB/UNIX provides other macro packages 
for various specialized formats, PWB/MM has become the standard, general-purpose macro package for 
most documents. 


PWB/MM can be used to produce: 


Letters. 

Reports. 

Technical Memoranda. 
Released Papers. 
Manuals. 

Books. 

etc. 


The uses of PwB/MM range from single-page letters to documents of several hundred pages in length. 
such as user guides, design proposais, etc. 


1.2 Conventions 


Each section of this memorandum explains a single facility of PwB/MM. In general, the earlier a section 
occurs, the more necessary it is for most users. Some of the later sections can be completely ignored if 
PWB/MM defaults are acceptable. Likewise, each section progresses from normal-case to special-case 
facilities. We recommend reading a section in detail only unul there is enough information to obtairr 
the desired format, then skimming the rest of it, because some details may be of use to just a few peo- 
ple. 


Numbers enclosed in curly brackets ({}) refer to section numbers within this document. For exam- 
ple, this is {1.2}. 


Sections that require knowledge of the formnatters {1.4} have a bullet (e) at the end of the section 
heading. 


In the synopses of macro calls, square brackets ({]) surrounding an argument indicate that it is 
optional. Ellipses (...) show that the preceding argument may appear more than once. 


A reference of the form mame(N) points to page name in section N of the Pwaru iy User's 
Manual ft}. 


The examples of ourpur in this manual are as produced by troff? nroff output would, of course, look 
somewhat different (Appendix D shows borh the nroff and rroff output for a simple letter). In those 


* UNIX 1s a Trademark of Beil Laboratories. 
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cases in which the behavior of the two formatters is truly different, the sroff action is described first, 
with the :roffaction following in parentheses. For example: 


The title is underlined (bold). 
means that the title is underlined in »roffand bold in rrof. 
1.3 Overall Structure of a Document 


The input for a document that is to be formatted with pwB/MM possesses four major segments. any of 
which may be omitted; if present, they must occur in the following order: 


~ 


e Parameter-setting~This segment sets the general style and appearance of a document. The user can 
control page width, margin justification, numbering styles for headings and lists, page headers and 
footers {9}, and many other properties of the document. Also, the user can add macros or redefine 
existing ones. This segment can be omitted entirely if one is satisfied with default values: it pro- 
duces no actual output, but only performs the setup for the rest of the document. 


e Beginning~—This segment includes those items that occur only once, at the beginning of a document. 
e.g., title, author's name, date. 


e Body—This segment is the actual text of the document. It may be as small as a single paragraph. or 
as large as hundreds of pages. It may have a hierarchy of headings up to seven levels deep {4}. 
Headings are automatically numbered (if desired) and can be saved to generate the table of con- 
tents. Six additional levels of subordination are provided by a set of /isr macros for automatic 
numbering, alphabetic sequencing, and ‘“*marking’’ of list items {5}. The body may also contain 
various types of displays, tables. figures, and footnotes {7, 8}. 


e Ending—This segment contains those items that occur once only, at the end of a document. 
Included here are signature{s) and lists of notations (e.g., ‘copy to’’ lists) {6.12}. Certain macros 
may be invoked here to print information that ts wholly or partially derived from the rest of the 
document. such as the table of contents or the cover sheet for a document {10}. 


The existence and size of these four segments varies widely among different document types. 
Although a specific item (such as date, title, author name(s), etc.) may be printed in several different 
ways depending on the document type, there is a uniform way of typing it in. 


1.4 Definitions 
The term /ormarter refers to either of the text-formatting programs nroffand trof. 


Requesis are built-in commands recognized by the formatters. Although one seldom needs to use 
these requests directly [3.9], this document contains references to some of them. Full details are given 
in [9]. For example, the request: 


SP 
inserts a blank line in the output. 


Macros are named collections of requests. Each macro is an abbreviation for a collection of requests 
that would otherwise require repetition. PWB/MM supplies many macros, and the user can define addi- 
tional ones. Macros and requests share the same set of names and are used in the same way. 


Strings provide character variables, each of which names a string of characters. Strings are often 
used in page headers, page footers. and lists. They share the pool of names used by requests and mac- 
ros. A String can be given a value via the .ds (define string) request, and its value can be obtained by 
referencing its name, preceded by ‘‘\*’” (for l-character narnes) or ‘‘\«("’ (for 2-character names). For 
instance, the string D7 in PWB/MM normally contains the current date, so that the input line: 


Today is \*(DT. 
may result in the following ourpur: 
Today is October 31, 1977. 


The current date can be replaced, e.z.: 
.ds DT 01/01/76 
or by invoking a macro designed for that purpose (6.7.1}. 


Number registers fill the role of integer variables. They are used for flags, for arithmetic, and for 
automatic numbering. A register can be given a value using a .mr request, and be referenced by 
preceding its name by ‘“‘\n’’ (for l1-character names) or ‘‘\n(’’ (for 2-character names). For example, 
the following sets the value of the register dto 1 more than that of the register dd: 


or d 1+\n(dd 
See {13.1} regarding naming conventions for requests, macros, strings, and number registers. 
1.5 Prerequisites and Further Reading 


1.5.1 Prerequisites. We assume familiarity with UNIX at the level given in (3] and [4]. Some familiarity 
with the request summary in [9] is helpful. 


1.5.2 Further Reading. [9] provides detailed descriptions of formatter capabilities, while [5] provides a 
general overview. See [6] (and possibly (7]) for instructions on formatting mathematical expressions. 
See 1b/(1) and [11] for instructions on formatting tabular data. 


Examples of formatted documents and of their respective input, as well as a quick reference to the 
material in this manual are given in [8]. 


2. INVOKING THE MACROS 


This section tells how to access PWB/MM, shows PWB/UNILX command lines appropriate for various out- 
put devices, and describes command-line flags for PwB/MM. Note that file names, program names, and 
typical command sequences apply only to PWB/UNIX;: different names and command lines may have to 
be used on other systems. 


2.1 The mm Command 


The mm(I) command can be used to print documents using aroff and PWB/MM; this command invokes 
nrof with the -mm flag {2.2}. It has options to specify preprocessing by :d/(I) and/or by negn(I), and 
for postprocessing by various output filters. Any arguments or flags that are not recognized by mrn(]1), 
e.g. -rC3, are passed to nroff or to PWB/MM, aS appropriate. The options, which can occur in any order 
but mus: appear before the file names, are: 


“2 neqn(I) is to be invoked. 

“t tbi{I) is to be invoked. 

<C col(I) is to be invoked. 

-12 need 12-pitch mode. Be sure that the pitch switch on the terminal is set to 12. 


-300 output is to a DASI300 terminal. This is the default terminal type. 
-hp Output is to a HP264x. 
-450 output is to a DASI450. 


-in output is to a GE TermiNet 300. 
-tn300 output is to a GE TermiNet 300. 
ti output is to a Texas Instrument 700 series terminal. 


-37 output is to a TELETYPE® Model 37. 
2.2 The -mm Flag 


The PWB/MM package can also be invoked by including the -mm flag as an argument to the formatter. 
It causes the file /usr/lib/tmac.m to be read and processed before any other files. This action defines 
the PWB/MM macros, sets default values for various parameters, and initializes the forrnatter to be ready 
to process the files of tnput text. 


2.3 Typical Command Lines 
The prototype command lines are as follows (with the various options explained in {2.4} and in [9]). 
e Text without tables or equations: 


mm [options] filename ... 
or nroff [options] -mm filename ... 
or troff [options] -mm filename ... 


© Text with tables: 


mm -t [options] filename ... 
or thi filename ... | nroff [options] -mm - 
or tbi filename ... | troff [options] -mm - 


e Text with equations: 


mm -e [options] filename ... 
or neqn filename ... | nroff [options] -mm - 
or eqn filename ... | troff [options] -mm - 


e Text with both tables and equations: 


mm -t -e [options] filename ... 
or tbl filename ... | neqn | nroff [options] -mm - 
or tbl filename ... | eqn | troff [options] -mm - 


When formatting a document with wroff, the output should normally be processed for a specific type of 
terminal, because the output may require some features that are specific to a given terminal, e.g., 
reverse paper motion or half-line paper motion in both directions. Some commonly-used terminal 
types and the command lines appropriate for them are given below. See {2.4} as weil as gsi(I), Ap(1), 
col(l). and rermmnais(VI1) for further information. 


e DASI300 (GSI300/DTC300) in 10-pitch, 6 lines/inch mode and a line length of 65 characters: 


mm filename ... 
or nroff -T300 -h -mm filename ... 


e DASI300 (GSI300/DTC300) in 12-pitch. 6 lines/inch mode and a line length of 80—rather than 
65 —characters: 


mm -12 filename ... 
or nroff -T300-12 -rW80 -rO3 -h -mm filename ... 


or, equivalently (and more succinctly): 
nrofi -T300-12 -rT1l -h -mm filename .. 
e DASI450 in 10-pitch, 6 lines/inch mode: 


mm -450 filename ... 
or nroff -T450 -h -mm filename . 


e DASI1450 in 12-pitch. 6 lines/inch mode: 


mm -450 -12 filename ... 
or nroff -T450-12 -rW80 -rO3 -h -mm filename... 
or nroff -T450-12 -rT1 -h -mm filename ... 


e Hewlett-Packard HP264x CRT family: 


mm -hp filename ... 
or nroff -h -mm filename... | hp 


e Any terminal incapable of reverse paper motion (GE TermiNet, Texas Instruments 700 series, etc.): 


mm -tn filename .. 
or nroff -mm filename ... | col 
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e Versatec printer (see vp(I) for additional details): 


vp [vp-options] "mm -rT2 -c filename ... 
or vp [vp-options] "nroff -rT2 -mm filename ... | col” 


Of course, ro/(1) and eqn({1)/negn(1). if needed, must be invoked as shown in the command line proto- 
types at the beginning of this section. 


If two-column processing {11.4} is used with #roff the -c option must be specified to mn(1). or the 
nrof output postprocessed by co/(I). In the latter case, the -T37 terminal type must be specified to 
nrof, the -h option must or be specified, and the output of co/(I) must be processed by the appropriate 
terminal filter (e.g.. gs:(1)): mm(1) with the -c option handles ail this automatically. 


2.4 Parameters that Caan Be Set from the Command Line 


Number registers aré commonly used within PwB/MM to hold parameter values that control various 
aspects of output style. Many of these can be changed within the text files via .nr requests. In addi- 
lon, some of these registers can be set from the command line itself, a useful feature for those param- 
eters that should nor be permanently embedded within the input text iself. If used, these registers 
(with the possible exception of the register P~see below) mus? be set on the command line (or before 
the PWB/MM macro definitions are processed) and their meanings are: 


-fAl has the effect of invoking. the .AF macro without an argument {6.7.2}. 


-fBn defines the macros for the cover sheet and the table of contents. If » is 1, table-of-contents pro- 
cessing is enabled. If is 2, then cover-sheet processing will occur. If » is 3, both will occur. 
That is. B having a value greater than 0 defines the .TC {10.1} and/or .CS {10.2} macros. Note 
that to have any effect, these macros must also be invoked. 


-tC# sets the type of copy (e.g.. DRAFT) to be printed at the bottom of each page. See {9.5}. 
n= | for OFFICIAL FILE COPY. 
n= 2 for DATE FILE COPY. 
n= 3 for DRAFT. 


--Dl sets debug mode. This flag requests the formatter to attempt to continue processing even if 
PWB/MM detects errors that would otherwise cause termination. I[t also includes some debugging 
information in the default page header {9.2, 11.3}. 


-tL& sets the length of the physical page to & lines.! The default value is 66 lines per page. This 


parameter is used for obtaining 8 lines-per-inch output on 1|2-pitch terminals, or when directing 
output to a Versatec printer. 


--Nn specifies the page numbering style. When v is 0 (default), all pages get the (prevailing) header 
19.2}. When vis 1, the page header replaces the footer on page 1 only. When » is 2. the page 
header is omitted from page 1. When vis 3, ‘section-page’’ numbering [4.5] occurs. 

Pages 2 ff. 
header header 


header replaces footer header 
no header header 
‘“section-page’’ as foorer 


The contents of the prevailing header and footer do nor depend on of the value of the number 
register V; N only controls whether and where the header (and, for Y= 3, the footer) is printed. 
as well as the page numbering style. In particular, if the header and footer are null {9.2, 9.3}. 
the value of N ts irrelevant. 


-rOk offsets output & spaces to the right.' It is helpful for adjusting output positioning on some termi- 
nals. NOTE: The register name ts the capital letter ‘O°’, nor the digit zero (0). 


--P1 specifies that the pages of the document are to be numbered starting with m. This register may 
also be set via a .nr request in the input text. 


1. For nroff, & ts an unscaled number representing lines or character positions: for sro, A must be sculecd. 
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-rSn sets the point size and vertical spacing for the document. The default 7 is 10, i.e.. 10-point type 
on 12-point leading (vertical spacing), giving 6 lines per inch {11.8}. This parameter applies to 
trrof only. 


-fT# provides register settings for certain devices. If 1 is 1, then the line length and page offset are 
set for output directed to a DASI300 or DASI450 in 12-pitch, 6 lines/inch mode, i.e.. they are 
set to 80 and 3, respectively. Setting 7 to 2 changes the page length to 84 lines per page and 
inhibits underlining; it is meant for output sent to the Versatec printer. The default value for 
is 0. This parameter applies to nroff only. 


-rUl controls underlining of section headings. This flag causes only letters and digits to be under- 
lined. Otherwise, alf characters (including spaces) are underlined [4.2.2.4.2}. This parameter 
applies to uroff only. 


-+‘-Wk page width (i.e., line length and title length) is set to &.* This can be used to change the page 
width from the default value of 65 characters (6.5 inches). 


2.5 Omission of -mm 


If a large number of arguments is required on the command line, it may be convenient to set up the 
first (or only) input file of a document as follows: 


zero or more initializations of registers listed in {2.4} 
so /usr/lib/tmac.m 
remainder of text 


In this case, one must wor use the -mm flag (nor the mm(I) command): the .so request has the 
equivalent effect. but the registers in {2.4] must be initialized before the .so request. because their 
values are meaningful only if set before the macro definitions are processed. When using this method. 
it ts best to “lock’’ into the input file only those parameters that are seldom changed. For example: 


nr W 80 

Ar O 10 

nr N 3 

nr Bl 

So /usr/lib/tmac.m 

-H | “INTRODUCTION” 


specifies. for nroff a line length of 80. a page offset of 10, “‘section-page’’ numbering, and table of con- 
tents processing. 


3. FORMATTING CONCEPTS 
3.1 Basic Terms 


The normal action of the formatters is to /// output lines from one or more input lines. The output 
lines may be justified so that both the left and right margins are aligned. As the lines are being filled. 
words are hyphenated [3.4] as necessary. It is possible to turn any of these modes on and off (see .SA 
{11.2}, Ay {3.4}, and the formatter .nf and .fi requests [9]). Turning off fill mode also turns off 
justification and hyphenation. 


Certain formatting commands (requests and macros) cause the filling of the current output line to 
cease, the line (of whatever length) to be printed. and the subsequent text to begin a new output line. 
This printing of a partially filled output line is known as a break. A few formatter requests and most of 
the PWB/MM macros cause a break. 


While formatter requests can be used with PwWB/MM. one must fully understand the consequences 
and side-effects that each such request might have. Actually, there is little need to use formatter 
requests, the macros described here should be used in most cases because: 


2. For nroff, Kis an wscaled number representing lines or character positions: [or rroff) A must be scaled. 
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— it is much easier to control (and change at any later point in time) the overall style of the document. 
— complicated facilities (such as footnotes or tables of contents) can be obtained with ease. 

— the user is insulated from the peculiarities of the formatter language. 

A good rule is to use formatter requests only when absolutely necessary {3.9}. 


In order to make it easy to revise the input text at a later time, input lines should be kept short and 
should be broken at the end of clauses; each new full sentence must begin on a new line. 


3.2 Arguments and Double Quotes 


For any macro call. a null argument is an argument whose width is zero. Such an argument often has a 
special meaning: the preferred form for a null argument is *". Note that omurting an argument is vor the 
same as supplying a vul! argument (for exampie, see the .MT macro in {6.6]). Furthermore. omitted 
arguments can occur only at the end of an argument list, while null arguments can occur anywhere. 


Any macro argument containing ordinary (paddable) spaces snusr be enclosed in double quotes (").° 
Otherwise. it will be treated as several separate arguments. 


Double quotes (") are nor permitted as part of the value of a macro argument or of a string that is to 
be used as a macro argument. If you must, use two grave accents (°') and/or two acute accents ('’) 
instead. This restriction is necessary because many macro arguments are processed (interpreted) a vari- 
able number of times: for example, headings are first printed in the text and may be (re)printed in the 
table of contents. 


3.3 Unpaddable Spaces 


When output lines are jusufled to give an even right margin, existing spaces in a line may have addi- 
tional spaces appended to them. This may harm the desired alignment of text. To avoid this problem. 
it is mecessary to be able to specify a space that cannot be expanded during justification, t.e., an wapadd- 
able space. There are several ways to accomplish this. 


First, one may type a backslash followed by a space (‘‘\ °°). This pair of characters directly gen- 
erates an unpaddable space. Second, one may sacrifice some seldom-used character to be translated into 
a space upon output. Because this translation occurs after justification, the chosen character mav be 
used anywhere an unpaddable space is desired. The tilde (~) is often used for this purpose. To use it 
in this way. insert the following at the beginning of the document: 


a 6 
If a tilde must actually appear in the output, it can be temporarily ‘“‘recovered”’ by inserting: 


Ur 


before the place where it is needed. Its previous usage is restored by repeating the “.tr ~". but only 
after a break or after the line containing the tilde has been forced out. Note that the use of the tlde in 
this fashion is of recommended for documents in which the tulde is used within equations. 


3.4 Hyphenation 


The formatters (and, therefore, pwB/MM) will automatically hyphenate words, if need be. However. 
the user may specify the hyphenation points for a specific occurrence of any word by the use of a spe- 
cial character known as a hyphenation indicator, or may specify hyphenation points for a smail list ot 
words (about 128 characters). 


If the Ayphenation indicaror (initially, the two-character sequence “*\'%"") appears at the beginning or 
end of a word, the word is or hyphenated. Alternatively, it can be used to indicate legal hyphenation 
point(s) inside a word. In any case, @// occurrences of the hyphenation indicator disappear on outpul. 


The user may specify a different hyphenation indicator: 


-HC [(hyphenation-indicator] 


3. A double quote (7) ts a sumie character that must not de confused with two apostrophes or acule accents (7°). or with two 
grave accents (""). 
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The circumflex (*) is often used for this purpose, this is done by inserting the following at the 
beginning of a document: 


-HC * 


Note that any word containing hyphens or dashes—also known as em dashes—will be hyphenated 
immediately after a hyphen or dash if it is necessary to hyphenate the word, even if the formatrer hyphe- 
nation function is turned off. 


Hyphenation can be turned off in the body of the text by specifying: 
nr Hy 0 


once at the beginning of the document. For hyphenation control within footnote text and across pages. 
see {8.3}. 


The user may supply, via the .hw request, a small list of words with the proper hyphenation points 
indicated. For example, to indicate the proper hyphenation of the word “printout,’’ one may specify: 


.Aw print-out 
3.5 Tabs 


The macros .MT {6.6}, .TC {10.1}, and .CS {10.2} use the formatter .ta request to set tab stops. and 
then restore the de/auit values* of tab settings. Thus, setting tabs to other than the default values is the 
user's responsibility. 


Note that a tab character is always interpreted with respect to its position on the inpur fie, rather 
than its position on the output line. In general. tab characters should appear only on lines processed in 
**no-fill”” mode {3.1}. 


Also note that d/(1) {7.3} changes tab stops. but does vor restore the default tab settings. 
3.6 Special Use of the BEL Character 


The non-printing character BEL is used as a delimiter in many macros where it is necessary to compute 
the width of an argument or to delimit arbitrary text, e.g., in headers and footers {9}, headings {4}, and 
list marks {5}. Users who include BEL characters in their input text (especially in arguments to mac- 
ros) will receive mangled output. 


3.7 Bullets 


A bullet (e) is often obtained on a typewriter terminal by using an ‘‘o’’ overstruck by a “+ °*. For 
compatibility with srof. a bullet string is provided by pwB/MM. Rather than overstriking, use the 
sequence: 


\=(BU 


wherever a bullet is desired. Note that the bullet list (.BL) macros {5.3.3.2} use this string to automati- 
cally generate the builets for the list items. 


3.8 Dashes, Minus Signs. and Hyphens 


Troff has distinct graphics for a dash, a minus sign, and a hyphen, while nroff does not. Those who 
intend to use nroffonly may use the minus sign (‘*-°’) for all three. 


Those who wish mainly to use sroff should follow the escape conventions of [9]. 


Those who want to use both formatters must take care during text preparation. Unfortunately. 
these characters cannot be represented in a way that is both compatible and convenient. We suggest 
the following approach: 


Dash Type “* -- °’ for each text dash. These can be left alone for nroff and later globally 
transiated for troffto ‘*\(em"*, namely an em dash (—). Note that the dash list (.DL) muac- 
ros {5.3.3.3} automatically generate the em dashes for the list items. 


4, Every eight characters in nroff? every '* inch in sroff, 
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Hyphen Type ‘‘-”* and use as is for both formatters. Nroff will print it as is, and sroff will print a true 
hyphen. 


Minus Type ‘‘\-"’ for a true minus sign. regardless of formatter. Nroff will effectively ignore the 
“\"" while croff will print a true minus sign. 


3.9 Use of Formatter Requests 


Most formatter requests [9] should sor be used with pwB/MM because PWB/ MM provides the correspond- 
ing formatting functions in a much more user-oriented and surprise-free fashion than do the basic for- 
matter requests {3.1]. However, some formatter requests are useful with pwB/MM, namely: 


af br ce  _.de ds fi aw Is nf nr 
Xx rm fr ss SO Sp ota ti tl if 


The .fp. .lg. and .ss requests are also sometimes useful for troff. Use of other requests without fully 
understanding their implications very often leads to disaster. 


4. PARAGRAPHS AND HEADINGS 


This section describes simple paragraphs and section headings. Additional paragraph and list styles are 
covered in [5]. 


4.] Paragraphs 


.P [type] 
one or more lines of text. 


This macro is used to begin two kinds of paragraphs. In a /e/t-justifed paragraph, the first line begins at 
the left margin, while in an :ndented paragraph. it is indented five spaces (see below). 


A document possesses a default paragraph snle obtained by specifying **.P’’ before each paragraph 
that does vor follow a heading {4.2}. The default style is controlled by the register Pr. The initial value 
of Pris 2, which provides indented paragraphs excepr after headings, lists. and displays. in which case 
they are left-justified. All paragraphs can be forced to be left-justified by inserting the following at the 
beginning of the document: 


nr Pt 0 

All paragraphs can be forced to be indented by inserting: ~ 
.ne Pt t 

at the beginning of the document. 


The amount a paragraph is indented is contained in the register Pi, whose default value ts 5. To 
indent paragraphs by, say. 10 spaces, insert: 


mr Pi 10 


at the beginning of the document. Of course, both the Pi and Pr register values must be greater than 
zera for any paragraphs to be indented. 


mw” Values that specifi indentation must be unscaled and are treated as ‘‘character positions,” 1.¢.. aS u 
number of ens. /n troff. aa en is the number of points (1 pomt = 1/72 of an inch) equal to halt the 
current point size. In nroff. anen 1s equal to the width of a character. 


Regardless of the value of Pr. an irdividual paragraph can be forced to be left-justified or indented. 
**.P Q”° always forces left justification; “’.P 1°” always causes indentation by the amount specified by the 
register Pi. 


If .P occurs inside a /isr, the indent (if any) of the paragraph is added to the current list indent {5}. 
4.2 Numbered Headings 


»H level [heading-text] 
zero or more lines of text 
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The .H macro provides seven levels of numbered headings, as illustrated by this document. Level |! 
is the most major or highest: level 7 the lowest. 


my There 1s no need for a .P macro afier a .H (or .HU (4:3)), because the .H macro also performs the 
function of the .P macro. In fact, ifa.P follows a .H, the user loses much of the flexibility provided dy 
the .H mechanism {4.2.2.2}. 


4.2.1 Normal Appearance. The normal appearance of headings is as shown in this document. The 
effect of .H varies according to the /eve/ argument. First-level headings are preceded by two biank lines 
(one vertical space): all others are preceded by one blank line (% a vertical space). 


.H 1 heading-text gives an underlined (bold) heading followed by a single blank line (% a vertical 
space). The following text begins on a new line and is indented according to the 
current paragraph type. Full capital letters should norrnally be used to make the 
heading stand out. 


.H 2 heading-text yields an underlined (bold) heading followed by a single blank line ('4 a vertical 
space). The following text begins on a new line and is indented according to the 
current paragraph type. Normally, initial capitals are used. 


.H » heading-text for 3< "<7, produces an underlined (italic) heading followed by two spaces. 
The following text appears on the same line, i.e., these are rus-i headings. 


Appropriate numbering and spacing (horizontal and vertical) occur even if the heading text is omitted 
from a.H macro call. 


Here are the first few .H calls of {4}: 


"PARAGRAPHS AND HEADINGS" 
_"Paragraphs” 

“Numbered Headings” 

"Normal Appearance." 

"Altering Appearance of Headings.” 
"Pre-Spacing and Page Ejection.” 
"Spacing After Headings.” 

"Centered Headings.” 

"Bold, Italic, and Underlined Headings." 
"Control by Level.” 
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4.2.2 Altering Appearance of Headings. Users satisfied with the default appearance of headings may skip 
to {4.3}. One can modify the appearance of headings quite easily by setting certain registers and strings 
at the beginning of the document. This permits quick alteration of a document's style, because this 
style-control information is concentrated in a few lines, rather than being distributed throughout the 
document. 


space) ene it, and all others have one Plank line (‘4 a vertical space). If a multi-line heading 
were to be split across pages, it is automatically moved to the top of the next page. Every first-level 
heading may be forced to the top of a new page by inserting: 


nr Ej | 


at the beginning of the document. Long documents may be made more manageable if each section 
Starts on a new page. Setting £/ to a higher value causes the same effect for headings up to that level. 
l.@., a page eject occurs if the heading level is less than or equal to &/. 


4.2.2.2 Spacing Afier Headings. Three registers control the appearance of text immediately following a 
-H call. They are Hd (heading break level), Hs (heading space level). and Hi (post-heading indent). 


If the heading level is less than or equal to Hb, a break {3.1} occurs after the heading. If the head- 
ing level is less than or equal to H's, a blank line (‘4 a vertical space) is inserted after the heading. 
Defaults for Hb and Hs are 2. If a heading level is greater than Hé and also greater than Hs, then the 
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heading (if any) is run into the following text. These registers permit headings to be separated from 
the text in a consistent way throughout a document, while allowing easy alteration of white space and 
heading emphasis. 


For any stand-alone heading, i.e., a heading not run into the following text, the alignment of the 
next line of output is controlled by the register Ai. If Ai is 0, text is left-justified. If Av is 1 (the 
default value), the text is indented according to the paragraph type as specified by the register Pr {4.1}. 
Finally, if Afi is 2, text is indented to line up with the first word of the heading itseif, so that the head- 
ing number stands out more clearly. Note that this feature is defeated if a .P macro follows the .H or 
.HU macro {4.2}. 


For example. to cause a blank line (% a vertical space) to appear after the first three heading levels, 
to have no run-in headings, and to force the text following all headings to be left-justified (regardless of 
the value of Pr), the following should appear at the top of the document: 


nr Hs 3 
ar Hb 7 
nr Hi QO 


tered if its level is less than or equal to Hc, and if it is also stand-alone {4.2.2.2}. Hc is 0 initially (no 
centered headings). 


4.2.2.4.1 Control by Level. Any heading that is underlined by zroffis made bold or italic by rroff The 
string AVF (heading font) contains seven codes that specify the fonts for heading levels 1-7. The legal 
codes, their interpretations, and the defaults for HF are: 


HF Code Default 
Formatrrer ] ; 7; HF 


cad 


nroft no underline underline underline | 3322222 

troft roman italic bold S322: 222 
Thus, all levels are underlined in zroff in troff, leveis 1 and 2 are boid, levels 3 through 7 are italic. 
The user may reset HF as desired. Any value omitted from the right end of the list is taken to be 1. 


For example, the following would result in five underlined (bold) levels and two non-underlined 
(roman) levels: 


ds HF 3333 3 


4.2.2.4.2 Nroff Underlining Svie. Nroffcan underline in two ways. The normal style (.ul request) is to 
underline only letters and digits. The continuous style (.cu request) underlines ail characters, including 
Spaces. By default, pwB/MM attempts to use the continuous style on any heading that is to be under- 
lined, is nor run-in, and is short enough to fit on a single line. If a heading is to be underlined, but ts 
either run-in or is too long, it is underlined the normal way (i.e., only letters and digits are underiined). 


All underlining of headings can be forced to the normal way by using the -rU1 flag when invoking 
nroff {2.4}. 


4.2.2.3 Marking Svies—Numerals and Concatenation. 
-HM [arg] ... [arg7] 


The registers named H/] through H7 are used as counters for the seven levels of headings. Their 
values are normally printed using Arabic numerals. The .HM macro (heading mark style) allows this 
choice to be overridden, thus providing ‘“‘outline’’ and other document styles. This macro can have up 
to seven arguments; each argument is a string indicating the type of marking to be used. Legal values 
and their meanings are shown below; omitted values are interpreted as 1, while illegal values have no 
effect. 
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Value Interpretation 


Arabic (default for all levels) 
000! Arabic with enough leading zeroes to get 
the specified number of digits 
A Upper-case alphabetic 
a Lower-case alphabetic 
I Upper-case Roman 
i Lower-case Roman 


By default, the complete heading mark for a given level is built by concatenating the mark for that level 
to the right of all marks for all levels of higher value. To inhibit the concatenation of heading level 
marks, i.e., to obtain just the current level mark followed by a period, set the register Hr (heading-mark 
type) to f. 


For example, a commonly-used ‘‘outline’’ style is obtained by: 


HMI Ala! 
nr Ht 1 


4.3 Unnumbered Headings 
-HU heading-iext 


-HU is a special case of .H: it is handled in the same way as .H, except that no heading mark is printed. 
In order to preserve the hierarchical structure of headings when .H and .HU calls are intermixed, each 
.HU heading is considered to exist at the level given by register Hu, whose initial value is 2. Thus, in 
the normal case, the only difference between: 


-HU heading-text 
and 
-H 2 heading-text 


is the printing of the heading mark for the latter. Both have the effect of incrementing the numbering 
counter for level 2, and resetting to zero the counters for levels 3 through 7. Typically, the value of 
Hu should be set to make unnumbered headings (if any) be the lowest-level headings in a document. 


.HU can be especially helpful in setting up Appendices and other sections that may not fit well into 
the numbering scheme of the main body of a document {13.2.1}. 


4.4 Headings and the Tabie of Contents 


The text of headings and their corresponding page numbers can be automatically collected for a table of 
contents. This is accomplished by doing the following three things: 


e specifying in the register C/ what level headings are to be saved. 
e invoking the .TC macro {10.1} at the end of the document, 
e and specifying -rB {2.4} on the command line. 


Any heading whose level is less than or equal to the value of the register C/ (contents level) is saved 
and later displayed in the table of contents. The default value for C/ is 2, 1.e., the first two levels of 
headings are saved. 


Due to the way the headings are saved, it is possible to exceed the formatter’s storage capacity. par- 
ticularly when saving many levels of many headings, while also processing displays {7} and footnotes 
{8}. If this happens, the ‘Out of temp file space’’ diagnostic [Appendix E} will be issued; the only 
remedy is to save fewer levels and/or to have fewer words in the heading text. 


4.5 First-Level Headings and the Page Numbering Style 


By default, pages are numbered sequentially at the top of the page. For large documents. it may be 
desirable to use page numbering of the form ‘“‘section-page,’” where section is the number of the 
current first-level heading. This page numbering style can be achieved by specifying the flag -rN3 on 
the command line {9.9}. As a side effect, this also has the effect of setting Ej to 1, i.e.. each section 


Se 


begins on a new page. In this style, the page number is printed at the dortom of the page, so that the 
correct section number ts printed. 


4.6 User Exit Macros e 
mr This section 1s intended only for users who are accustomed to writing formatter macros. 


-HX dlevel rievel heading-text 
-HZ dlevel rlevel heading-text 


The .HX and .HZ macros are the means by which the user obtains a final level of contro! over the 
previously-described heading mechanism. PwB/MM does not define .HX and .HZ: they are intended to 
be defined by the user. The .H macro invokes .HX shortly before the actual heading text is printed: it 
calls .HZ as its last action. All the default actions occur if these macros are not defined. If the .HX or 
-HZ (or both) are defined by the user, the user-supplied definition is interpreted at the appropriate 
point. These macros can therefore influence the handling of all headings, because the .HU macro is 
actually a special case of the .H macro. 


If the user originally invoked the .H macro, then the derived level (dleve/) and the real level (rieve/) 
are both equal to the level given in the .H invocation. If the user originally invoked the .HU macro 
{4.3}, dleve/ is equal to the contents of register Hu, and rieve/is 0. In both cases, heading-tex: is the text 
of the original invocation. 


By the time .H calls .HX, it has already incremented the heading counter of the specified level 
{4.2.2.5}. produced blank line({s) (vertical space) to precede the heading {4.2.2.1}. and accumulated the 
“heading mark’’, i.e.. the string of digits, letters, and periods needed for a numbered heading. When 
.HX is called. all user-accessible registers and strings can be referenced. as weil as the following: 


string }0 If rievel is non-zero, this string contains the ‘heading mark.’’ Two unpaddable spaces 
(to separate the mark from the heading) have been appended to this string. If rievef is 
0, this string Is null. 


register .0 This register indicates the type of spacing that is to follow the heading {4.2.2.2}. A 
value of 0 means that the heading is run-in. A value of | means a break (but no blank 
line) is to follow the heading. A value of 2 means that a blank line (“% a vertical 
space) is to follow the heading. 


string |2 If register :0 is 0, this string contains two unpaddable spaces that will be used to 
separate the (run-in) heading from the following rex. If register .0 is non-zero. this 
String 1s null. 


register .3 This register contains an adjusiment factor for a .ne request issued before the heading 
is actually printed. On entry to .HX, it has the value 3 if dleve/ equals 1, and | other- 
wise. The .ne request is for the following number of lines: the contents of the register 
.0 taken as blank lines (halves of vertical space) plus the contents of register .3 as 
blank lines (halves of vertical space) plus the number of lines of the heading. 


The user may alter the values of }0, }2, and ;3 within .HX as desired. The following are examples of 
actions that might be performed by defining .HX to include the lines shown: 


Change first-level heading mark from format x. to 7.0: 
Af \\St= 1 .ds JO \\n(H1.0\c\c (<2 stands for a space) 


Separate run-in heading from the text with a period and two unpaddable spaces: 

Af \n(j0= 0 .ds }2 .\a\c 

Assure that at least 15 lines are left on the page before printing a first-level heading: 
if \\Sl= 1 .nr 33. 15-\\n(50 

Add 3 additional blank lines before each first-level heading: 

Jf \\Sl= 1 .sp 3 


If temporary string or macro names are used within .HX, care must be taken in the choice of their 
names {13.1}. 
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.HZ is called at the end of .H to permit user-controlled actions after the heading is produced. For 
example, in a large document, sections may correspond to chapters of a book, and the user may want to 
reset counters for footnotes. figures, tables, etc. Another use might be to change a page header or 
footer. For example: 


.de HZ 

if \\S$l=1 \{.nr :p 0 \" footnotes 
: nr Fg 0 \" figures 

; nr Tb 0 \" tables 

: nr Ec 0 \" equations 

. PF *’’Section \\$3°'"\} 


4.7 Hints for Large Documents 


A large document is often organized for convenience into one file per section. If the files are num- 
bered, it is wise to use enough digits in the names of these files for the maximum number of sections. 
i.e., use suffix numbers 01 through 20 rather than | through 9 and 10 through 20. 


Users often want to format individual sections of long documents. To do this with the correct sec- 
tion mumbers, it is necessary to set register H/ to | less than the number of the section just before the 
corresponding **.H |”° cail. For example, at the beginning of section 5, insert: 


nr Hi 4 - 


ww This is a dangerous practice: it defeats the automatic (re)numbering of sections when sections are added 
or deleted. Remove such lines as soon as possible. 


§. LISTS 


This section describes many different kinds of lists: automatically-numbered and alphabetized lists. bul- 
let lists. dash lists, lists with arbttrary marks, and lists starting with arbitrary strings, é.g., with terms or 
phrases to be defined. 


5.1 Basic Approach 


In order to avoid repetitive typing of arguments to describe the appearance of items in a list, PWB/MM 
provides a convenient way to specify lists. All lists are composed of the following parts: 


e A list-initialization macro that controls the appearance of the list: line spacing, indentation, marking 
with special symbols, and numbering or alphabetizing. 


e One or more Lisz /tem (.LI) macros, each followed by the actual text of the corresponding list item. 
e The List End (.LE) macro that terminates the list and restores the previous indentation. 


Lists may be nested up to six levels. The list-inttialization macro saves the previous list status (inden- 
tation, marking style, etc.), the .LE macro restores it. 


With this approach, the format of a list is specified only once at the beginning of that list. In addi- 
tion, by building on the existing structure, users may create their own customized sets of list macros 
with relatively little effort (5.4, Appendix A, Appendix B}. 


§.2 Sample Nested Lists 


The input for several lists and the corresponding output are shown below. The .AL and .DL macro 
calls {5.3.3} contained therein are examples of the /isr-initialization macros. This example will help us to 
explain the material in the following sections. Input text: 


AL A 

LI 

This is an alphabetized iter. 

This text shows the alignment of the second line 
The quick brown fox jumped over the lazy dog’s 
AL 

LI 

This is a numbered item. 

This text shows the alignment of the second line 
The quick brown fox jumped over the lazy dog’s 
~DL , 

LI 

This is a dash item. 

This text shows the alignment of the second line 
The quick brown fox jumped over the lazy dog's 
LI + 1 

This is a dash item with a ‘‘plus’’ as prefix. 
This text shows the alignment of the second line 
The quick brown fox jumped over the lazy dog's 
“LE 

LI 

This is numbered itern 2. 

LE 

LI 

This is another alphabetized item, B. 

This text shows the alignment of the second line 
The quick brown fox jumped over the lazy dog's 
CE 

P 

This paragraph appears at the left margin. 


Output: 
A. This is an alphabetized item. This text shows the alignment of the second line of the item. 


quick brown fox jumped over the lazy dog's back. 


1. This is a numbered item. This text shows the alignment of the second line of the item. 


of the 
back. 


of the 
back. 


of the 
back. 


of the 
back. 


of the 
back. 


quick brown fox jumped over the lazy dog’s back. 


— This is a dash item. This text shows the alignment of the second line of the item. 


quick brown fox jumped over the lazy dog's back. 
+ — This is a dash item with a ‘‘plus’’ as prefix. This text shows the alignment of the second 


line of the item. The quick brown fox jumped over the lazy dog's back. 


2. This is numbered item 2. 


item. 


item. 


item. 


item. 


item. 


B. This is another alphabetized item, B. This text shows the alignment of the second line of the item. 
The quick brown fox jumped over the lazy dog’s back. 


This paragraph appears at the left margin. 


§.3 Basic List Macros 


Because all lists share the same overall structure except for the list-initialization macro, we first discuss 
the macros common to all lists. Each list-initialization macro is covered in {5.3.3}. 


5.3./ List litem. 


-LI (mark] [1] 


one or more lines of text that make up the list item. 
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The .LI macro is used with all lists. It normally causes the output of a single blank line ( a verti- 
cal space) before its item, although this may be suppressed. If no arguments are given, it labels its 
item with the current mark, which is specified by the most recent list-initialization macro. If a single 
argument is given to .LI, that argument is output instead of the current mark. If two arguments are 
given, the first argument becomes a prefix to the current mark, thus allowing the user to emphasize one 
or more items in a list. One unpaddable space is inserted between the prefix and the mark. For exam- 
ple: 


BL 6 

LI * 

This is a simple bullet item. 

LI + 

This replaces the bullet with a ‘‘plus.”’ 

LIE + Xxx 

But this uses “plus” as prefix to the bullet. 
LE 


yields: 
e This is a simple bullet item. 
+ This replaces the bullet with a ‘*pius.”” 
+ e But this uses “‘plus”’ as prefix to the bullet. 


mr The mark must not contain ordinary (paddable) spaces, because alignment of items will be lost if the right 
margin is justified {3.3}. | 


If the current mark (in the current list) is a null string, and the first argument of .LI is omitted or null, 
the resulting effect is that of a Aanging indent, i.e., the first line of the following text is ‘‘outdented,” 
starting at the same place where the mark would have started [5.3.3.6}. 


5.3.2 List End. 
LE (1] 


List End restores the state of the list back to that existing just before the most recent list-initialization 
macro call. If the optional argument is given, the .LE outputs a blank line (‘4 a vertical space). This 
option should generally be used only when the .LE is followed by running text, but not when followed 
by a macro that produces blank lines of its own, such as .P, .H, or .LI. 


.H and .HU automatically clear all list information, so one may legally omit the .LE(s) that would 
normally occur just before either of these macros. Such a practice is mot recommended. however, 
because errors will occur if the list text is separated from the heading at some later time (e.g., by inser- 
tion of text). 


5.3.3 List Initialization Macros. The following are the various list-initialization macros. They are actu- 
ally implemented as calls to the more basic .LB macro {5.4}. 


5.3.3.1 Automarically-Numbered or Alphabetized Lists. 
.AL [type] [text-indent] [1] 


The .AL macro is used to begin sequentially-numbered or alphabetized lists. If there are no arguments, 
the list is numbered, and text is indented ZL, (initially 5)? spaces from the indent in force when the .AL 
is called, thus leaving room for two digits, a period, and two spaces before the text. 


The ape argument may be given to obtain a different type of sequencing, and its value should indi- 
cate the first element in the sequence desired, i.e., it must be 1, A, a, I, or i {4.2.2.5}.© If mpe is omit- 
ted or null, then ‘*l”’ is assumed. If text-indent is non-null, it is used as the number of spaces from the 
current indent to the text, i.e., it is used instead of Li for this list only. If text-indent is null, then the 
value of L, will be used. 


5. Values that specily indentation must be wasca/ed and are treated as ‘character positions.” i.2.. as (he number of CNS. 
6. Note that the “OQOO1"” format is vor permitted. 


5072 


If the third argument is given, a blank line (% a vertical space) will nor separate the items in the list. 
A blank line (“% a vertical space) will occur before the first item, however. 


5.3.3.2 Buller List. 
.BL [text-indent] [1] 


-BL begins a bullet list, in which each item is marked by a bullet (e) followed by one space. If rexr- 
indent is non-null, it overrides the default indentation—the amount of paragraph indentation as given in 
the register Pi (4.1}.’ 


If a second argument is specified, no blank lines will separate the items in the list. 
5.3.3.3 Dash List. 

-DL [text-indent] [1] 
-DL is identical to .BL, except that a dash is used instead of a bullet. 
5.3.3.4 Marked List. 

ML mark [text-indent] [1] 


-ML is much like .BL and .DL, but expects the user to specify an arbitrary mark, which may consist of 
more than a single character. Text is indented sexr-indent spaces if the second argument is not null. 
otherwise, the text is indented ome more space than the width of mark. If the third argument is 
specified, no blank lines will separate the iterns in the list. 


my” The mark must not contain ordinary (paddable) spaces, because alignment of items will be lost if the right 
margin is justified {3.3}. 


5.3.3.5 Reference Lisi. 
.RL [text-indent} [1] 


A .RL call begins an automaticaily-numbered list in which the numbers are enclosed by square brackets 
({]). Texr-indent may be supplied, as for .AL. If omitted or null, it is assumed to be 6, a convenient 
value for lists numbered up to 99. If the second argument is specified, no blank lines will separate the 
items in the list. The list of references {14} was produced using the .RL macro. 


5.3.3.6 Variable-ltem List. 
.VL text-indent {mark-indent] [1] 


When a list begins with a .VL, there is effectively no current mark; it is expected that each .LI will pro- 
vide tts own mark. This form is typically used to display definitions of terms or phrases. Mark-indent 
gives the number of spaces from the current indent to the beginning of the mark, and it defaults to 0 if 
omitted or null. Texr-indent gives the distance from the current indent to the beginning of the text. If 
the third argument is specified, no blank lines will separate the items in the list. Here is an exarnple of 
.VL usage: 


7. So that, in the default case. the text of bullet and dash lists lines up with the first line of indented paragraphs. 
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Se 

VL 20 2 

~LI mark™l 

Here is a description of mark 1; 

“mark 1°’ of the .LI line contains a tilde translated to an unpaddable space in order 
to avoid extra spaces between 

““mark’’ and “1° {3.3}. 

.LI second” mark 

This is the second mark, also using a tilde translated to an unpaddable space. 
.LI third™mark longer than “indent: 

This item shows the effect of a long mark; one space separates the mark 
from the text. 

LI ~ 

This item effectively has no mark because the 

tilde following the .LI is transiated into a space. 


.LE 
yields: 
mark | Here is a description of mark 1; ‘‘mark 1°° of the .LI line contains a tilde 
transiated to an unpaddable space in order to avoid extra spaces between ‘‘mark”’ 
and °**1°* {3.3}. 
second mark This is the second mark, also using a tilde translated to an unpaddabie space. 


third mark longer than indent: This item shows the effect of a long mark: one space separates the 
mark from the text. 


This item effectively has no mark because the tilde following the .L] is translated 
into a space. 


The tilde argument on the last .LI above is required; otherwise a Aanging indent would have been pro- 
duced. A Aanging indent is produced by using .VL and calling .LI with no arguments or with a null first 
argument. For example: 


VL 10 

Ll 

Here is some text to show a hanging indent. 
The first line of text is at the left margin. 
The second is indented 10 spaces. 

«LE 


yields: 
Here is some text to show a hanging indent. The first line of text is at the left margin. The second is 
indented 10 spaces. 
mr” The mark must not contain ordinary (paddable) spaces, because alignment of items will be lost if the right 
margin is justified \3.3}. 
5.4 List-Begin Macro and Customized Lists e 
-LB text-indent mark-indent pad type [mark] [Ll-space] [LB-space] 


The list-initialization macros described above suffice for almost all cases. However, if necessary, one 
may obtain more control over the layout of lists by using the basic list-begin macro .LB, which is also 
used by ail the other list-initialization macros [Appendix A]. Its arguments are as follows: 


Text-indent gives the number of spaces that the text is to be indented from the current indent. Nor- 
maily, this value is taken from the register Li for automatic lists and from the register Pi for bullet and 
dash lists. 


The combination of mark-indent and pad determines the placement of the mark. The mark is placed 
within an area (called mark area) that starts mark-indenr spaces to the right of the current indent. and 
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rn 


ends where the text begins (i.e., ends ‘exr-indent spaces to the right of the current indent). Within the 
mark area, the mark is left-justified if pad is 0. If pad is greater than 0, say 7, then a blanks are 
appended to the mark: the mark-indent value is ignored. The resulting string immediately precedes the 
text. That is, the mark is effectively righi-justified pad spaces immediately to the left of the text. 


Tvpe and mark interact to control the type of marking used. If nope is 0, simple marking is per- 
formed using the mark character(s) found in the mark argument. If spe is greater than 0, automatic 
numbering or alphabetizing is done, and mark is then interpreted as the first item in the sequence to be 
used for numbering or alphabetizing, i.e., it is chosen from the set (1, A, a, I, i) as in {5.3.3.1}. That 
is: 


Result 


omitted hanging indent 
0 string string is the mark 
>0 omitted arabic numbering 
>0 ~—ione off: automatic numbering or 


1,A,a, 1,3 alphabetic sequencing 


Each non-zero value of mpe from | to 6 selects a different way of displaying the items. The following 
table shows the output appearance for each value of mpe: 


Type Appearance 


where x is the generated number or letter. 


mp The mark must not contain ordinary (paddabie) spaces, because alignment of items will be lost if the right 
margin is justified {3.3}. 


Li-space gives the number of blank lines (halves of a vertical space) that should be output by each .LI 
macro in the list. If omitted, L/-space defaults to 1; the value 0 can be used to obtain compact lists. If 
Li-space is greater than 0, the .LI macro issues a .ne request for two lines just before printing the mark. 


L8-space, the number of blank lines (4 a vertical space) to be output by .LB itself, defaults to 0 if 
omitted. 


There are three reasonable combinations of L/-space and L8B-space. The normal case is to set L/- 
space to | and L&-space to 0, yielding one blank line Jefore each item in the list: such a list ts usually 
terminated with a “.LE 1” to end the list with a blank line. In the second case, for a more compact 
list, set L/-space to 0 and LB-space to 1, and, again, use “*.LE {°° at the end of the list. The result is a 
list with one blank line before and after it. If you set both L/-space and L&-space to 0, and use *.LE™ 
to end the list, a list without avy blank lines will result. 


Appendix A shows the definitions of the list-initialization macros {5.3.3} in terms of the .LB macro. 
Appendix B illustrates how the user can build upon those macros to obtain other kinds of lists. 


6. MEMORANDUM AND RELEASED PAPER STYLES 


One use of PWB/MM is for the preparation of memoranda and released papers, which have special 
requirements for the first page and for the cover sheet. The information needed for the memorandum 
or released paper (title, author, date, case numbers, etc.) is entered in the same way for dorh styles: an 
argument to one macro indicates which style is being used. The following sections describe the macros 
used to provide this data. The required order is shown in {6.9}. 


8. The mardh-indent argument 1s typically 0. 


“90's 


if neither the memorandum nor released-paper style is desired, the macros described below should 
be omitted from the input text. If these macros are omitted, the first page will simply have the page 
header (9} followed by the body of the document. 


6.1 Title 


.TL (charging-case] [filing-case] 
one or more lines of title text 


The arguments to the .TL macro are the charging case number(s) and filing case number(s).? The title 
ot the memorandum or paper follows the .TL macro and is processed in fill mode {3.1}. Multiple 
charging case numbers are entered as “‘sub-arguments’’ by separating each from the previous with a 
comma and a space, and enclosing the entire argument within double quotes. Multiple filing case 
numbers are entered similarly. For example: 


TL "12345, 67890" 987654321 
On the construction of a table 
of all even prime numbers 


The .br request may be used to break the title into several lines. 


On output, the title appears after the word ‘‘subject’’ in the memorandum style. In the released- 
paper style, the title is centered and underlined (bold). 


§.2 Author(s) 
.AU name [initials] [loc] [dept] [ext] {room] [arg] [arg] [arg] 


The .AU macro receives as arguments information that describes an author. If any argument contains 
blanks, it must be enclosed within double quotes. The first six arguments must appear in the order 
given (a separate .AU macro is required for each author). For example: 


.AU "J. J. Jones” JIJ PY 9876 5432 1Z-234 


[In the “‘from’* portion in the memorandum style, the author's name is followed by location and depart- 
ment number on one line and by room number and extension number on the next. The ‘x’ for the 
extension is added automatically. The printing of the location, department number, extension nurmber, 
and room number may be suppressed on the first page of a memorandum by setting the register 4u to 
0: the default value for duis 1. Arguments 7 through 9, if present, will follow this “*normal’” author 
information, each on a separate line. Certain organizations have their own numbering schemes for 
memoranda, engineer's notes, etc. These numbers are printed after the author's name. This can be 
done by providing more than six arguments to the .AU macro, e.g.: 


AU "S. P. Lename” SPL IH 9988 7766 5H-444 3322.11AB 


The name, initials, location, and department are also used in the Signature Block {6.11.1}. The author 
information in the ‘“*from’’ portion, as well as the names and initials in the Signature Block will appear 
in the same order as the .AU macros. 


The names of the authors in the released-paper style are centered below the title. After the name of 
the last author, ‘““Bell Laboratories’’ and the location are centered. For the case of authors from 
different locations, see {6.8}. 


6.3 TM Number(s) 
.TM (number] ... 


[f the memorandum is a Technical Memorandum, the TM numbers are supplied via the .TM macro. 
Up to nine numbers may be specified. Example: 


~TM 7654321 77777777 


This macro cail is ignored in the released-paper and external-letter styles [6.6]. 


9. The “charging case” is the case number to which time was charged for the development of the project described in the 
memorandum. The ‘filing cuse”’ is a number under which the memorandum is to be filed. 


£9] 


6.4 Abstract 


.AS [arg] [indent] 
text of the abstract 
AE 


In both the memorandum and released-paper styles, the text of the abstract follows the author informa- 
tion and is preceded by the centered and underlined (italic) word “ABSTRACT.” 


The .AS (abstract start) and .AE (abstract end) macros bracket the (optional) abstract. The first 
argument to .AS controls the printing of the abstract. If it is 0 or null, the abstract is printed on the 
first page of the document, immediately following the author information, and is also saved for the 
cover sheet. If the first argument is 1, the abstract is saved and printed only on the cover sheet. The 
margins of the abstract are indented on the left and right by five spaces. The amount of indentation 
can be changed by specifying the desired indentation as the second argument.!° 


Note that headings {4.2, 4.3], displays {7}, and footnotes {8} are nor (as yet) permitted within an 
abstract. 


6.5 Other Keywords 
.OK [keyword] ... 


Topical keywords should be specified on a Technical Memorandum cover sheet. Up to nine such key- 
words or keyword phrases may be specified as arguments to the .OK macro; if any Keyword contains 
Spaces. it must be enciosed within double quotes. 


6.6 Memorandum Types 
-MT [type] [1] 


The .MT macro controls the format of the top part of the first page of a memorandum or of a released 
paper, as well as the format of the cover sheets. Legal codes for mpe and the corresponding values are: 


no memorandum type is printed 
no memorandum type is printed 
MEMORANDUM FOR FILE 
MEMORANDUM FOR FILE 


PROGRAMMER’S NOTES 
ENGINEER’S NOTES 
Released-Paper style 
External-Letter style 

MT “sizing” string 


If mpe indicates a memorandum style, then value will be printed after the last line of author information 
or after the last line of the abstract, if one appears on the first page. If mpe is longer than one charac- 
ter, then it, itself, will be printed. For example: 


.MT "Technical Note #5” 
A simple letter ts produced by calling .MT with a null (but or omitted!) or zero argument. 


The second argument to .MT is used only if the first argument is 4 (i.e., for the released-paper 
style) as explained in [6.8}. 


In the external-letter style (.MT 5), only the date is printed in the upper mght corner of the frst 
page. It is expected that preprinted stationery will be used. providing the author's company logotype 
and address. 


10. Values that specify indentation must be unsealed and are treated as “character positions.” 1.2.. as (he number of ens. 


299s 


6.7 Date and Format Changes 


6.7.1 Changing the Date. By default, the current date appears in the ‘“‘date’’ part of a memorandum. 
This can be overridden by using: 


.ND new-date 
The .ND macro alters the value of the string D7, which ts initially set to the current date. 


6.7.2 Alternate First-Page Format. One can specify that the words ‘“‘subject,"’ “‘date,’’ and ‘*from”” (in 
the memorandum style) be omitted and that an alternate company name be used: 


.AF [company-name] 


If an argument is given, it replaces ‘**Bell Laboratories’’, without affecting the other headings. If the 
argument is null, ‘Bell Laboratories’’ is suppressed; in this case, extra blank lines are inserted to allow 
room for stamping the document with a Beil System logo or a Bell Laboratories stamp. .AF with xo 
argument suppresses “Bell Laboratories’’ and the ‘‘Subject/Date/From’’ headings, thus allowing output 
on preprinted stationery. 


The only .AF option appropriate for sroffis to specify an argument to replace ‘‘Bell Laboratories” 
with another name. 


6.8 Released-Paper Style 
The released-paper style is obtained by specifying: 
MT 4 [1] 


This results in a centered, underlined (bold) title followed by centered names of authors. The location 
of the last author is used as the location following ‘‘Bell Laboratories’’ (unless .AF {6.7.2} specifies a 
different company). If the optional second argument to .MT is given, then the name of each author is 
followed by the respective company name and location. The abstract, if present, follows the author 
information. 


Information necessary for the memorandum style but not for the released-paper style is ignored. 


If the released-paper style is utilized, most BTL location codes'! are defined as strings that are the 
addresses of the corresponding BTL locations. These codes are needed only until the .MT macro is 
invoked. Thus. /ollowig the .MT macro, the user may re-use these string names. In addition, the 
macros described in {6.11} and their associated lines of input are ignored when the released-paper style 
is specified. 


Authors from non-BTL locations may include their affiliations in the released-paper style by specify- 
ing the appropriate .AF before each .AU. For example: 


TL 

A Learned Treatise 

.AF “Getem Inc." 

.AU "“F. Swatter’ 

-AF “Bell Laboratories” 

-AU "Sam P. Lename” *” CB 
MT 4 1 


6.9 Order of Invocation of ‘Beginning’ Macros 


The macros described in (6.1-6.7}, if present, must be given in the following order: 


11. The complete list is: AK. CP. CH, CB, DR. HO. IN, IH. MV. MH, PY, RR. RD. WV, and WH. 


eee 


.ND new-date 

.TL (charging-case] [filing-case] 
one or more lines of text 

.AF [company-name] 

.-AU name [initials] [loc] (dept] [ext] [room] [arg] [arg] [arg] 
.T™ [number] ... 

-AS [arg] [indent] 

one or more lines of text 

AE 

.OK [keyword] ... 

«MT [type] [1] 


The only required macros for a memorandum or a released paper are .TL, .AU, and .MT; all the others 
(and their associated input lines) may be omitted if the features they provide are not needed. Once 
.MT has been invoked, zone of the above macros can be re-invoked because they are removed from 
the table of defined macros to save space. 


6.10 Example 
The input text for this manual begins as follows: 


TL 

P\s-3 WB/MM\s0\(emProgrammer’s Workbench Memorandum Macros 
.AU "D. W. Smith” DWS PY... 

~AU "J. R. Mashey” JRM MH... 

MT 4 | 


6.11 Macros for the End of a Memorandum 


At the end of a memorandum (but not of a released paper), the signatures of the authors and a list of 
notations'* can be requested. The following macros and their input are ignored if the released-paper 
style is selected. 


6.11.1 Signature Block. 
SG [arg] [1] 


.SG prints the author name(s) after the last line of text, aligned with the ‘*“Date/From’’ block. Three 
blank lines are left above each name for the actual signature. [f no argument is given. the line of refer- 
ence data!’ will nor appear following the last line. 


A non-null first argument is treated as the typist’s initials, and is appended to the reference data. 
Supply a null argument to print reference data with neither the typtst’s initials nor the preceding 
hyphen. 


If there are several authors and if the second argument is given, then the reference data is placed on 
the same line as the name of the first author, rather than on the line that has the name of the last 
author. 


The reference data contains only the location and department number of the first author. Thus. if 
there are authors from different departments and/or from different locations, the reference data should 
be supplied manually after the invocation (without arguments) of the .SG macro. For example: 


SG 

TS 

Sp -lv 
PY/MH-9876/5432-JJJ/SPL-cen 


12. See [2]. pp. 1.12—16 


13. The foilowing information is known as reference data: locution code, department number. author's initials. and typist’s 
imttialy. all separated by hyphens. See [2]. page 1.11 
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6.11.2 “Copy to" and Other Notations. 


.NS [arg] 
zero or more lines of the notation 
.NE 


After the signature and reference data, many types of notations may follow, such as a list of attach- 
ments or ‘‘copy to”’ lists. The various notations are obtained through the .NS macro, which provides 
for the proper spacing and for breaking the notations across pages, if necessary. 


The codes for arg and the corresponding notations are: 


Copy to 


“NS 0 Copy to 

NS Copy to 

.NS |! Copy (with att.) to 
NS 2 Copy (without att.) to 
.NS 3 Alt. 

NS 4 Atts. 

.NS 5 Enc. 

-NS 6 Encs. 

.NS 7 Under Separate Cover 
NS 8 Letter to 

“NS 9 Memorandum to 

.NS “srring" Copy (string) to 


If arg consists of more than one character. it is placed within parentheses between the words ‘‘Copy”™ 
and ‘“‘to.”’ For example: | 


NS “with att. | only” 


will generate ‘““Copy (with att. | only) to’’ as the natation. More than one notation may be specified 
before the .NE occurs, because a .NS macro terminates the preceding notation. if any. For exarnple: 


«NS 4 

Attachment 1-List of register names 
Attachment 2-List of string and macro names 
NS | 

J. J. Jones 

.NS 2 

S. P. Lename 

G. H. Hurtz 

.NE 


would be formatted as: 


Atts. 
Attachment |-List of register names 
Attachment 2-List of string and macro names 


Copy (with att.) to 
J. J. Jones 


Copy (without att.) to 
S. P. Lename 
G. H. Hurtz 


6.12 Forcing a One-Page Letter 


At times, one would like just a bit more space on the page, forcing the signature or iterns within nota- 
tions onto the bottom of the page, so that the letter or memo is just one page in length. This can be 
accomplished by increasing the page length through the -rL option, e.g. -rL90. This has the effect of 


25 - 
at 


making the formatter believe that the page is 90 lines long and therefore giving it more room than 
usual to place the signature or the notations. This will o/y work for a singie-page letter or memo. 


7. DISPLAYS 


Displays are blocks of text that are to be kept together—not split across pages. PwB/MM provides two 
styles of displays:'* a stance (.DS) style and a floating (.DF) style. In the svaric style, the display appears 
in the same relative position in the output text as it does in the input text; this may result in extra 
white space at the bottom of the page if the display is too big to fit there. In the floanng style, the 
display “*floats’’ through the input text to the top of the next page if there is not enough room for it on 
the current page: thus the input text that follows a floating display may precede it in the output text. A 
queue of floating displays is maintained so that their relative order is not disturbed. 


By default, a display is processed in no-fill mode and is nor indented from the existing margin. The 
user can specify indentation or centering, as well as fill-mode processing. 


Displays and footnotes {8} may never be nested, in any combination whatsoever. Although lists {5} 
and paragraphs {4.1} are permitted, no headings (.H or .HU) can occur within displays or footnotes. 


7.1 Static Displays 


.DS [format] [fill] 
one or more lines of text 
.DE 


A static display is started by the .DS macro and terminated by the .DE macro. With no arguments. 
.DS will accept the lines of text exactly as they are typed (no-fill mode) and will mor indent them from 
the prevailing indentation. The formar argument to .DS is an integer with the following meanings: 


Code Meaning 


as no indent 
0 no indent 
indent: by standard amount 
center each line 


The fli argument is also an integer and can have the following meanings: 


oe no-fill mode 
no-fill mode 


fill mode 


Omitted arguments are taken to be Zero. 


The standard amount of indentation is taken from the register Si, which is initially 5. Thus, by 
default, the text of an indented display aligns with the first line of indented paragraphs, whose indent is 
contained in the Pi register {4.1}. Even though their initial values are the same, these two registers are 
independent of one another. 


By default, a blank line (‘4 a vertical space) is placed before and after static and floating displays. 
These blank lines before and after sraric displays can be inhibited by setting the register Ds to 0. 
7.2 Floating Displays 


.DF [format] [fll] 
one or more lines of text 
-DE 


A floating display is started by the .DF macro and terminated by the .DE macro. The arguments have 
the same meanings as for .DS {7.1}, except that, for floating displays, indent, no indent, and centering 
are always calculated with respect to the initial left margin, because the prevailing indent may change 


14. Displays are processed in an environment that is different from that of the body of the text (see the .ev request in [9}?. 
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between the time when the formatter first reads the floating display and the time that the display is 
printed. One blank line (4 a vertical space) a/ways occurs both before and after a floating display. 


7.3 Tables 


.DS 
.TS 
one or more lines of text to be processed by /d/(1) 
TE 
.DE 


The .TS (table start) and .TE (table end) macros make possible the use of the ré/(1) processor [11]. 
They are used only to delimit the text to be examined by /b/(I). Thus, the display function and the 
roi(1) delimiting function are independent of one another, in order to permit one to keep together 
blocks that contain any mixture of tables, equations, filled and unfilled text, and caption lines. 


If a particular document does not need this flexibility, it is possible to define .TS and .TE so that 
they act like .DS and .DE, respectively, and are also recognized by rd/(I): 


.de TS 
.Ds "\\S1" “\\$2" 


ae TE 
.DE 


If floating tables are desired, substitute .DF for .DS in the above. 


7.4 Equations 
-DS 
EQ 
equation(s) 
aN 
DE 


The equation setters eg(I) and negi(1) [6,7] expect to use the .EQ (equation start) and .EN (equation 
end) macros as delimiters in the same way that /6/(1) uses .TS and .TE; .EQ and .EN must occur either 
inside .DS-.DE pairs or else be defined by the user as shown above for the .TS and .TE macros. 


my” There is an exception to this rule: if.EQ and .EN are used only to specify the delimuers for in-line equa- 
tions or to specify eqn/neqn ‘‘defines,’’.DS and .DE must not be used: otherwise extra blank lines will 
appear in the output. 


7.5 Figure, Tabie, and Equation Captions 


FG [title] [override] [flag] 
TB [title] [override] [flag] 
.EC [title] [override] [flag] 


The .FG (Figure Title), .TB (Table Title), .EC (Equation Caption) macros are normally used inside 
.DS-.DE pairs to automatically number and title figures, tables. and equations. They use registers Fg, 
Tb, and Ec, respectively.'° As an example, the call: 


-FG “This is an illustration” 
yields: 
Figure 1. This is an illustration 


.TB replaces ‘‘Figure’’ by ““TABLE”’; .EC replaces ‘‘Figure’’ by ‘‘Equation’’. Output is centered if it 
can fit. on a single line; otherwise, all lines but the first are indented to line up with the first character of 
the title. The format of the numbers may be changed using the .af request of the formatter. 


15. The user may wish to reset these registers after each first-level heading {4.6}. 
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The override string is used to modify the normal numbering. If fag is omitted or 0, override is used 
as a prefix to the number, if flag is 1, override is used as a suffix: and if flag is 2, override replaces the 
number. For example, to produce figures numbered within sections, supply \n(H1 for override on each 
.FG call, and reset Fg at the beginning of each section, as shown in {4.6}. 


AS a matter of style, table headings are usually placed ahead of the text of the tables, while figure 
and equation captions usually occur after the corresponding figures and equations. 


7.6 Blocks of Filled Text 


One can obtain blocks of filled text through the use of .DS or .DF. However, to have the block of 
filled text centered within the current line length, the 7d/(1) program may be used: 


.DS 0 1 
TS 
center; 
Iw40 . 
T{ 


T) 
TE 
.DE 


The *.DS 0 1” begins a non-indented, filled display. The ré/(1) parameters set up a centered tabie 
with a column width of 40 ens. The “*T{ ... T}’’ sequence allows filled text to be input as data within a 
table. 


8. FOOTNOTES 


There are two macros that delimit the text of footnotes,'® a string used to automatically number the 
footnotes, and a macro that specifies the style of the footnote text. 


8.1 Automatic Numbering of Footnotes 


Footnotes may be automatically numbered by typing the three characters **\*F"’ immediately after the 
text to be footnoted, without any intervening spaces. This will place the next sequenual footnote 
number (in a smaller point size) a half-line above the text to be footnoted. 


8.2 Delimiting Footnote Text 
There are two macros that delimit the text of each footnote: 


FS [label] 
one or more lines of footnote text 
FE 


The .FS (footnote start) marks the beginning of the text of the footnote, and the .FE marks its end. 
The /abe/ on the .FS, if present, will be used to mark the footnote text. Otherwise, the number 
retrieved from the string F will be used. Note that automatically-numbered and user-labeled footnotes 
may be intermixed. If a footnote is labeled (.FS /abe/), the text to be footnoted musr be followed by 
labei, rather than by ‘“‘\*F"’. The text between .FS and .FE is processed in fill mode. Another .FS. a 
.DS, or a .DF are nor permitted between the .FS and .FE macros. Examples: 


16. Footnotes are processed in an environment that is different from that of the body of the text (see the .ev request in [9]). 


eb oe 


1. Automatically-numbered footnote: 


This is the line containing the word\+F 
.FS 

This is the text of the footnote. 

.FE 

to be footnoted. 


2. Labelled footnote: 


This is a labeled 

.FS = 

The footnote is labeled with an asterisk. 
FE 

footnote. 


The text of the footnote (enclosed within the .FS-.FE pair) should immediately follow the word to be 
footnoted in the input text, so that ‘‘\*F"’ or /abe/ occurs at the end of a line of input and the next line 
is the .FS macro call. It is also good practice to append a unpaddable space {3.3} to ‘‘\*F”° or /adel 
when they follow an end-of-sentence punctuation mark (i.e., period, question mark, exclamation 
point). 


Appendix C illustrates the various available footnote styles as well as numbered and labeled foot- 
notes. 


8.3 Format of Footnote Text e 
.FD [arg] (1) 


Within the footnote text, the user can control the formatting style by specifying text hyphenation. right 
margin justification, and text indentation, as well as left- or right-justification of the label when text 
indenting is used. The .FD macro is invoked to select the appropriate style. The first argument is a 
number from the left column of the following table. The formatting style for each number is given by 
the remaining four columns. For further explanation of the first two of these columns, see the 
definitions of the .ad, .hy, .na, and .nh requests in (9]. 


0 .nh_ .ad text indent labei left justified 
1 «Rhy .ad " " 

2 .nh.na " 

3 .Ay os .na i . 

4 .nh  .ad_ no text indent : 

5 hy .ad ° . 

6 .nh.na " ‘i 

7 Ay 3 .na " " 

8 .nh_ .ad text indent label right justified 
9 hy .ad : . 
10 .nh- .na : ° 
ll .Ay  .na 7 


If the first argument to .FD is out of range, the effect is as if .FD 0 were specified. If the first argu- 
ment is omitted or null, the effect is equivalent to .FD 10 in nroff and to .FD 0 in troff these are also 
the respective initial defaults. 


If a second argument is specified, then whenever a first-level heading is encountered, automatically- 
numbered footnotes begin again with 1. This is most useful with the ‘‘section-page’’ page numbering 
scheme. As an example, the input line: 


FD “" | 


maintains the default formatting style and causes footnotes to be numbered afresh after each first-level 
heading. 
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For long footnotes that continue onto the following page, it is possible that, if hyphenation is per- 
mitted, the last line of the footnote on the current page will be hyphenated. Except for this case (over 
which the user has control by specifying an even argument to .FD), hyphenation across pages is inhi- 
bited by PWB/MM. 


Footnotes are separated from the body of the text by a short rule. Footnotes that continue to the 
next page are separated from the body of the text by a full-width rule. In croff; footnotes are set in type 
that is two points smaller than the point size used in the body of the text. 


8.4 Spacing between Footnote Entries 


Normally, one blank line (a three-point vertical space) separates the footnotes when more than one 
occurs on a page. To change this spacing, set the register Fs to the desired value. For example: 


mr Fs 2 
will cause two blank lines (a six-point vertical space) to occur between footnotes. 


9. PAGE HEADERS AND FOOTERS 


Text that occurs at the top of each page is known as the page header. Text printed at the bottom of 
each page is called the page foorer. There can be up to three lines of text associated with the header: 
every page, even page only, and odd page only. Thus the page header may have up to two lines of text: 
the line that occurs at the top of every page and the line for the even- or odd-numbered page. The 
sare is true for the page footer. 


This section first describes the default appearance of page headers and page footers, and then the 
ways of changing them. We use the term Aeader (nor qualified by ever or odd) to mean the line of the 
page header that occurs on every page, and similarly for the term /foorer. 

9.1 Default Headers and Footers | 


By default, each page has a centered page number as the header {9.2}. There is no default footer and 
no even/odd defauit headers or footers, except as specified in {9.9}. 


In a memorandum or a released paper, the page header on the first page is automatically suppressed 
provided a break does nor occur before .MT is called. The macros and text of {6.9} and of {9} as weil as 
.nr and .ds requests do mor cause a break and are permitted before the .MT macro cail. 


9.2 Page Header 


-PH [arg] 
For this and for the .EH, .OH, .PF, .EF, .OF macros, the argument is of the form: 
”"left-part’ center-part right-part’” 


If it is inconvenient to use the apostrophe (’) as the delimiter (i.e., because it occurs within one of the 
parts), it may be replaced unformiy by any other character. On output, the parts are left-justified, cen- 
tered, and right-justified, respectively. See {9.11} for examples. 


The .PH macro specifies the header that is to appear at the top of every page. The initial value (as 
stated in {9.1}) is the default centered page number enclosed by hyphens. See the top of this page for 
an example of this default header. 


If debug mode is set using the flag -r-D1 on the command line {2.4}, additional information, printed 
at the top left of each page, is included in the default header. This consists of the SCCS [10] Release 
and Level of pwB/MM (thus identifying the current version {11.3}), followed by the curreat line number 
within the current input file. 


9.3 Even-Page Header 
-EH [arg] 


The .EH macro supplies a line to be printed at the top of each even-numbered page, immediately /o/- 
lowing the header. The initial value is a blank line. 
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9.4 Odd-Page Header 

.OH [arg] 
This macro is the same as .EH, except that it applies to odd-numbered pages. 
9.5 Page Footer 

.PF [arg] 


The .PF macro specifies the line that is to appear at the bottom of each page. Its initial value is a blank 
line. If the -rCw flag is specified on the command line {2.4}, the type of copy follows the footer on a 
separate line. In particular, if -rC3 (DRAFT) is specified, then, in addition, the footer is initialized to 
contain the date [6.7.1], instead of being a biank line. 


9.6 Even-Page Footer 
.EF [arg] 


The .EF macro supplies a line to be printed at the bottom of each even-numbered page. immediately 
preceding the footer. The initial value is a blank line. 


9.7 Odd-Page Footer 

.OF [arg] 
This macro is the samme as .EF, except that it applies to odd-numbered pages. 
9.8 Footer on the First Page 


By default. the footer is a blank line. If, in the input text, one specifies .PF and/or .OF before the end 
of the first page of the document. then these lines will appear at the bottom of the first page. 


The header (whatever its contents) replaces the footer on the first page onlv if the -rN1 flag is 
specified on the command line {2.4}. 


9.9 Defauit Header and Footer with “Section-Page’’ Numbering 


Pages can be numbered sequentially within sections {4.5}. To obtain this numbering style. specify -rN3 
on the command line. In this case, the default footer is a centered “‘section-page’’ number. e.g. 3-5, 
and the default page header is blank. 


9.10 Use of Strings and Registers in Header and Footer Macros e 


String and register names may be placed in the arguments to the header and footer macros. If the 
value of the string or register 1s to be computed when the respective header or footer 1s printed, the invoca- 
tion must be escaped by four (4) backslashes. This is because the string or register invocation will be 
processed three times: 


e as the argument to the header or footer macro. 
e ina formatting request within the header or footer macro: 
e ina .tl request during header or footer processing. 


For example, the page number register P must be escaped with four backslashes in order to specify a 
header in which the page number is to be printed at the night margin. e.g.: 


.PH "’’’ Page \\\\nP°’" 


Creates a right-justified header containing the word ““Page’’ followed by the page number. Similarly. to 
specify a footer with the ‘‘section-page™” style. one specifies (see {4.2.2.5] for meaning of H/): 


PF "°*- \\\\n(H1-\\\\nP -"" 


As another example. suppose that the user arranges for the string a/ to contain the current section 
heading which ts to be printed at the bottem of each page. The .PF macro call would then be: 
»PF *’"\\\\#(a]*" 


If only one or two backslashes were used. the footer would print a constant value for a/, namely. its 
value when the .PF appeared in the input text. 
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9.11 Header and Footer Example e 


The following sequence specifies blank lines for the header and footer lines, page numbers on the out- 
side edge of each page (i.e., top left margin of even pages and top right margin of odd pages), and 
**Revision 3°" on the top inside margin of each page: 


.PH “" 
PF “” 
EH "’\\\\nP’ Revision 3°" 
.OH "Revision 3°°\\\\nP*" 


9.12 Generalized Top-of-Page Processing « 
mr This section is intended only for users accustomed to writing formatter macros. 


During header processing, PW8/MM invokes two user-definable macros. One, the .TP macro, is invoked 
in the environment (see .ev request in (9]) of the header; the other, .PX, is a user-exit macro that is 
invoked (without arguments) when the normal environment has been restored, and with “‘no-space”™ 
mode already in effect. 


The effective initial definition of .TP (after the first page of a document) is: 


de TP 

.Sp 

UNC de 

if e ‘th \\e(le 
Jf o ‘ud \\=(Jo 
Sp 


The string }/ contains the header. the string }e contains the even-page header, and the string }o contains 
the odd-page header. as defined by the .PH. .EH, and .OH macros, respectively. To obtain more spe- 
cialized page titles, the user may redefine the .TP macro to cause any desired header processing {11.5}. 
Note that formatting done within the .TP macro is processed in an environment different from that of 
the body. 


For example. to obtain a page header that includes three centered lines of data, say, a document's 
number, issue date, and revision date. one could define .TP as follows: 


de TP 

Sp 

ce 3 

777-888-999 

Iss. 2, AUG 1977 

Rev. 7, SEP 1977 

Sp 
The .PX macro may be used to provide text that is to appear at the top of each page after the normul 
header and that may have tab stops to align it with columns of text in the body of the document. 


9.13 Generalized Bottom-of-Page Processing 


The facility to permit user-defined processing for the bottom of each page is nor currently available. 


10. TABLE OF CONTENTS AND COVER SHEET 


The table of contents and the cover sheet for a document are produced by invoking the .TC and .CS 
macros. respectively. The appropriate -rBvr option {2.4} must a/so be specified on the command line. 
These macros should normally appear only once at the end of the document, after the Signature Block 
16.11.41} and Notations {6.11.2} macros. They may occur in either order. 


Pt io 


The table of contents is produced at the end of the document because the entire document must be 
processed before the table of contents can be generated. Similarly, the cover sheet is often not needed, 
and is therefore produced at the end. 


10.1 Table of Contents 
.TC [slevel] [spacing] (tlevel] [tab] [head1] [head2] [head3] [head4] [head5] 


The .TC macro generates a table of contents containing the headings that were saved for the table of 
contents as determined by the value of the C/ register {4.4}. Note that -rBl or -rB3 {2.4} must also be 
specified to the formatter on the command line. The arguments to .TC control the spacing before each 
entry, the placement of the associated page number, and additional text on the first page of the table of 
contents before the word ‘“‘CONTENTS.”’ 


Spacing before each entry is controlled by the first two arguments; headings whose level is less than 
or equal to sieve/ will have spacing blank lines (halves of a vertical space) before them. Both s/eve/ and 
spacing default to 1. This means that first-level headings are preceded by one blank line (“% a vertical 
space). Note that sievel does nor control what levels of heading have been saved; the saving of headings 
is the function of the C/ register {4.4}. 


The third and fourth arguments control the placement of the page number for each heading. The 
page numbers can be justified at the right margin with either blanks or dots (‘‘feaders”’) separating the 
heading text from the page number, or the page numbers can follow the heading text. For headings 
whose level is less than or equal to sleve/ (default 2), the page numbers are justified at the right margin. 
In this case, the value of :ab determines the character used to separate the heading text from the page 
number. If :ad is 0 (the default value), dots (i.e., leaders) are used; if rad is greater than 0, spaces are 
used. For headings whose level is greater than //eve/, the page numbers are separated from the heading 
text by two spaces (i.e., they are ‘‘ragged right’’). 


All additional arguments (e.g., head!, head2, etc.), if any, are horizontally centered on the page, and 
precede the actual table of contents itseif,. 


If the .TC macro is invoked with at most four arguments, then the user-exit macro .TX is invoked 
(without arguments) before the word ““CONTENTS” is printed. By defining .TX and invoking .TC 
with at most four arguments, the user can specify what needs to be done at the top of the (first) page 
of the table of contents. For example, the following input: 


de TX 
ce 2 
Special Application 
Message Transmission 
Sp 2 
in + 10n 
Approved: \I°3i’ 
in 
Sp 
TC 

yields: 


Special Application 
Message Transmission 


Approved: 


CONTENTS 


s 
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10.2 Cover Sheet 
-CS [pages] [other] [total] [figs] [tbis] [refs] 


The .CS macro generates a cover sheet in either the TM or released-paper style.'7 All of the other 
information for the cover sheet is obtained from the data given before-the .MT macro call {6.9}. If the 
released-paper style is used, all arguments to .CS are ignored. If a memorandum style is used, the .CS 
macro generates the ““Cover Sheet for Technical Memorandum.’’ The arguments provide the data that 
appears in the lower left corner of the TM cover sheet (2]: the number of pages of text, the number of 
other pages, the total number of pages, the number of figures, the number of tables, and the number 
of references. 


11. MISCELLANEOUS FEATURES 
11.1 Bold, Italic, and Roman 


.B [bold-arg] [previous-font-arg] 
I fitalic-arg] (previous-font-arg] 
RK 


When called without arguments, .B (or .I) changes the font to bold (or italic) in roff; and initiates 
underlining in nroff'® This condition continues until the occurrence of a .R, when the regular roman 
font is restored. Thus, 


| 
here is some text. 
R . 


yields: 
here is some text. 


If .B or .I is called with one argument, that argument is printed in the appropriate font (underlined in 
nrof). Then the previous font is restored (underlining is turned off in nrof). If two arguments are 
given to a .B or .I, the second argument is then concatenated to the first with no intervening space, but 
iS printed in the previous font (not underlined in »roff). For exarnple: 


.I italic 

text 

I right -justified 
produces: 

italic text right-justified 


One can use both boid and italic fonts if one intends to use rroff but the nroff version of the output 
does not distinguish between bold and italic. It is probably a good idea to use .I only, unless bold is 
truly required. Note that font changes in headings are handled separately (4.2.2.4.1}. 


Anyone using a terminal that cannot underline might wish to insert: 


£m ul 
rm cu 


at the beginning of the document to eliminate a// underlining. 
11.2 Justification of Right Margin 
SA [arg] 


The .SA macro is used to set right-margin justification for the main body of text. Two justification flags 
are used: current and default. .SA 0 sets both flags to no justification, i.e., it acts like the .na request. 
SA | is the inverse: it sets both flags to cause justification, just like the .ad request. However, calling 


17. But only if --B2 or -rB3 has been specified on the command line. 
18. For ease of expianation, in this section {11.1} wof behavior 1s described first, the convention of (1.2) not withstanding. 
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.SA without an argument causes the current flag to be copied from the defaulr flag, thus performing 
either a .na or .ad, depending on what the defauir is. Initially, both flags are set for no justification in 
nroff and for justification in trof- 


In general, the request .na can be used to ensure that justification is turned off, but .SA should be 
used to restore justification, rather than the .ad request. In this way, justification or lack thereof for the 
remainder of the text is specified by inserting .SA 0 or .SA | once at the beginning of the document. 


11.3 SCCS Release Identification 
The string RE contains the SCCS [10] Release and Level of the current version of PwB/MM. For exam- 
ple, typing: 
This is version \*(RE of the macros. 
produces: 
This is version 12.2 of the macros. 


This information is useful in analyzing suspected bugs in PWB/MM. The easiest way to have this 
number appear in your output is to specify -rD1 {2.4} on the command line, which causes the string RE 
to be output as part of the page header (9.2). 


11.4 Two-Column Output 
PwB/MM can print two columns on a page: 


.2C 
text and formatting requests (except another .2C) 
AC 


The .2C macro begins two-column processing which continues until a .1C macro is encountered. In 
two-colurnn processing, each physical page is thought of as containing two columnar ‘‘pages’’ of equal 
(but smailer) ‘‘page’’ width. Page headers and footers are nor affected by two-column processing. The 
.1C macro does nor ‘‘balance’” two-column output. 


11.5 Column Headings for Two-Column Output e 
my” This section is intended only for users accustomed to writing formatter macros. 


In two-column output, it is sometimes necessary to have headers over each column, as well as headers 
over the entire page (9|. This is accomplished by redefining the .TP macro {9.12} to provide header 
lines both for the entire page and for each of the columns. For example: 


.de TP 

sp 2 

wu “Page \\nP’OVERALL’’ 

wu °° TITLE’ 

Sp 

nf 

ta 16C 31R 34 50C 65R 
left—-center—right—left—center—right (where — stands for the tab character) 
— first column---—-—second column 
fi 

sp 2 


The above example will produce two lines of page header text plus two lines of headers over each 
column. The tab stops are for a 65-en overall line length. 


11.6 Vertical Spacing 
SP [lines] 
There exist several ways of obtaining vertical spacing, all with different effects. 


235 


=e 


The .sp request spaces the number of lines specified, witless ‘‘no space’’ (.ns) mode is on, in which 
case the request is ignored. This mode is typically set at the end of a page header in order to eliminate 
Spacing by a .sp or .bp request that just happens to occur at the top of a page. This mode can be turned 
off via the .rs (‘‘restore spacing’’) request. 


The .SP macro is used to avoid the accumulation of vertical space by successive macro calls. 
Several .SP calls in a row produce or the sum of their arguments, but their maximum: i.e., the follow- 
ing produces only 3 blank lines: 


SP 2 
OF 3 
SP 


Many PwB/MM macros utilize .SP for spacing. For example, ‘‘.LE 1°° {5.3.2} immediately followed by 
‘\.P’’ {4.1} produces only a single blank line (4 a vertical space) between the end of the list and the 
following paragraph. An omitted argument defaults to one biank line (one vertical space). Unscaled 
fractional amounts are permitted; like .sp, .SP is also inhibited by the .ns request. 


11.7 Skipping Pages 
SK [pages] 


The .SK macro skips pages, but retains the usual header and footer processing. If pages is omitted, 
null, or 0, .SK skips to the top of the next page uniess it is currently at the top of a page, in which case 
it does nothing. .SK n skips # pages. That is. .SK always positions the text that follows it at the top of 
a page, while .SK 1 always leaves one page that is blank except for the header and footer. 


11.8 Setting Point Size and Vertical Spacing 


In rroff, the default point size (obtained from the register S (2.4}) is 10, with a vertical spacing of 12 
points (i.e., 6 lines per inch). The prevailing point size and vertical spacing may be changed by invok- 
ing the .S macro: 


S {arg} 


If arg is null, the previous point size is restored. If arg is negative, the point size is decremented by the 
specified amount. If arg is signed positive, the point size is incremented by the specified amount, and if 
arg is unsigned, it is used as the new point size; if arg is greater than 99, the default point size (10) is 
restored. Vertical spacing is always two points greater than the point size.'9 


12, ERRORS AND DEBUGGING 
12.1 Error Terminations 
When a macro discovers an error, the following actions occur: 


e A break occurs. 


e To avoid confusion regarding the location of the error, the formatter output buffer (which may con- 
tain some text) is printed. 


e A short message is printed giving the name of the macro that found the error, the type of error, and 
the approximate line number (in the current input file) of the last processed input line. (All the 
error messages are explained in Appendix E.) 


e Processing terminates, unless the register D {2.4} has a positive value. In the latter case, processing 
continues even though the output is guaranteed to be deranged from that point on. 


my” The error message is printed by writing it directly to the user's terminal. Lf an output filter, such as 
gsi(1), 450(1), or hp(I) is being used to post-process nroff ourput, rhe message may be garbled by being 
intermixed with text held in thar filter’s output buffer. 


19. Footnotes {8} are printed in a size two points synadler than the poimt size of the body. with an additional vertical spacine of 
three points between footnotes. 
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mr [f either tbi(1) or eqn(D)/neqn(]1), or both are being used, and if the -olist option of the formatter causes 
the last page of the document not to be printed, a harmless ‘broken pipe’’ message results. 


12.2 Disappearance of Output 


This usually occurs because of an unclosed diversion (e.g., missing .FE or .DE). Fortunately, the mac- 
ros that use diversions are careful about it, and they check to make sure that illegal nestings do not 
occur. If any message is issued about a missing .DE or .FE, the appropriate action is to search back- 
wards from the termination point looking for the corresponding .DS, .DF, or .FS. 


The following command: 
grep -n "*\.[EDFT][EFNQS]” files ... 


prints all the .DS, .DF, .DE, .FS, .FE, .TS, .TE, .EQ, and .EN macros found in files..., each preceded 
by its file name and the line number in that file. This listing can be used to check for illegal nesting 
and/or omission of these macros. 


13. EXTENDING AND MODIFYING THE MACROS e 

13.1 Naming Conventions 

In this section, the following conventions are used to describe legal names: 
digit 
lower-case letter 

: upper-case letter 


any letter or digit (any alphanumeric character) 
special character (any non-aiphanumeric character) 


aa Pea 


All other characters are literals (i.e., stand for themselves). 


Note that request, macro, and string names are Kept by the formatters in a single internal table, so 
that there must be no duplication among such names. MNurnder register names are Kept in a separate 
table. 


/3.1.1 Names Used by Formatiers. 


requests: aa (most common) 

an (only one, currently: .c2) 
registers: aa (normal) 

.x (normal) 


S (only one, currently: .S) 
% (page number) 


13.1.2 Names Used by PWB/MM. 


macros: AA (most common, accessible to user) 
A (less common, accessible to user) 
)x (internal, constant) 
>x (internal, dynamic) 


strings: AA (most common, accessible to user) 
A (less common, accessible to user) 
]x (internal, usually allocated to specific functions throughout) 
x (internal, more dynamic usage) 


registers: Aa (most common, accessible to users) 
An (common, accessible to user) 
A (accessible, set on command line) 
:x (mostly internal, rarely accessible, usually dedicated) 
3x (internal, dynamic, temporaries) 
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13.1.3 Names Used by EQN/INEQN and TBL. The equation preprocessors, egn(I) and negn({I), use 
registers and string names of the form an. The table preprocessor, /d/(I), uses names of the form: 


a- a a| nn #a ee) O#- #* “a T& TW 


13.1.4 User-Definable Names. After the above, what is left for user extensions? To avoid problems, 
we suggest using names that consist either of a single lower-case letter, or of a lower-case letter fol- 
lowed by anything other than a lower-case letter. The following is a sample naming convention: 


macros: aA 

Aa 
strings: a 

a) (or af. or a}, ete.) 
registers a 

aA 


13.2 Sample Extensions 
13.2.1 Appendix Headings. The following gives a way of generating and numbering appendices: 


nr Hu | 

nr a Q 

de aH 

oat! 

nr P 0 

.PH "’’’ Appendix \\na - \\\\\\\\nP*" 
SK 

HU \\S1" 


After the above initialization and definition, each call of the form ‘*.aH “title"’’ begins a new page 
(with the page header changed io ““Appendix a-»’’) and generates an unnumbered heading of ttle, 
which, if desired, can be saved for the table of contents. Those who wish Appendix titles to be cen- 
tered must, in addition, set the register Hc to 1 {4.2.2.3}. 


13.2.2 Hanging Indent with Tabs. The following example illustrates the use of the hanging-indent 
feature of variable-item lists [5.3.3.6]. First, a user-defined macro is built to accept four arguments that 
make up the mark. Each argument is to be separated from the previous one by a tab character, tab set- 
tings are defined later. Since the first argument may begin with a period or apostrophe, the “\&" is 
used so that the formatter will not interpret such a line as a formatter request or macro.*? The ‘‘\t"’ is 
translated by the formatter into a tab character. The ‘‘\c’’ is used to concatenate the line of sexr that 
follows the macro to the line of text built by the macro. The macro definition and an example of its 
use are as follows: 


20. The two-character sequence “\&" is understood by the formatiers to be a “‘zero-width”” space. i.g.. it causes no output 
characters (o appear. 
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de ax 


LI 
V&\\SIAA\SZ\A\\SS\\\S4\t\c 


nh 


ehy 
ic 


e¢ 


an @ @ 


ta 9n 18n 27n 36n 


VL 36 


aX .nh off \- no 


No hyphenation. 
Automatic hyphenation is turned off. 
Words containing hyphens 


(e.g., mother-in-law) may still be split across lines. 


.aX .hy on \- no 


Hyphenate. 

Automatic hyphenation is turned on. 

.aX .hcldc none none no (9 stands for a space) 
Hyphenation indicator character is set to ‘‘c’’ or removed. 

During text processing the indicator is suppressed 


and will not appear in the output. 


Prepending the indicator to a word has the effect 
of preventing hyphenation of that word. 


LE - 
The resulting output is: 
off =- no No hyphenation. Automatic hyphenation is turned off. Words 
containing hyphens (e.g.. mother-in-law) may still be split 
across lines. 
on _ no Hyphenate. Automatic hyphenation is turned on. 
eg none none no Hyphenation indicator character is set to ‘‘c’’ or removed. 
During text processing the indicator is suppressed and will not 
appear in the output. Prepending the indicator to a word has 
the effect of preventing hyphenation of that word. 
CONCLUSION 


14. 


The following are the qualities that we have tried to emphasize in PWB/MM, in approximate order of 
importance: 


® 


Robustness in the face of error~A user need not be an aroff/troff expert to use these macros. When 
the input is incorrect, either the macros attempt to make a reasonable interpretation of the error, or 
a message describing the error is produced. We have tried to minimize the possibility that a user 
would get cryptic system messages or strange output as a result of simple errors. 


Ease of use for simple documents—It is not necessary to write complex sequences of commands to 
produce simmple documents. Reasonable default values are provided, where at ail possible. 


Parameterization—There are many different preferences in the area of document styling. Many 
parameters are provided so that users can adapt the output to their respective needs over a wide 
range of styles. 


Extension by moderately expert users—We have made a strong effort to use mnemonic naming con- 
ventions and consistent techniques in the construction of the macros. Naming conventions are 
given so that a user can add new macros or redefine existing ones, if necessary. 


Device independence—The most common use of PwB/MM is to print documents on hard-copy type- 
writer terminals, using the nroff formatter. The macros can be used conveniently with both 10- and 
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12-pitch terminals. In addition, output can be scanned with an appropriate CRT terminal. The mac- 
ros have been constructed to allow compatibility with croff, so that output can be produced both on 
typewriter-like terminals and on a phototypesetter. 


e Minimization of input-The design of the macros attempts to minimize repetitive typing. For exam- 
ple, if a user wants to have a biank line after all first- or second-level headings, he or she need only 
set a specific parameter once at the beginning of a document, rather than add a blank line after each 
such heading. 


e Decoupling of input format from ourput style~There is but one way to prepare the input text, although 
the user may obtain a number of output styles by setting a few global flags. For example, the .H 
macro is used for all numbered headings, yet the actual output style of these headings may be made 
to vary from document to document or, for that matter, within a single document. 


Future releases of PWB/MM will provide additional features that are found to be useful. The authors 
welcome comments, suggestions, and criticisms of the macros and of this manual. 
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during the implementation of PWB/MM we have generated atypical requirements and encountered 
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Appendix A: DEFINITIONS OF LIST MACROS e 
me” This appendix is intended only for users accustomed to writing formatter macros. 
Here are the definitions of the list-initialization macros {5.3.3}:2! 


de AL 

ifle\\Slee@ .iffe\\Slele .if!e\\S$leae .iffe\\SleAe .ifle\\Slel@ .iffe\\$le@ie .)D”AL:badarg:\\$1 
if \\n(.$<3 \{.ie \we\\$2e@=0 .)L \\n(Lin 0 \\n(Lin-\we\0\0.e@u 1 \\$1" 

el .LB O\\$2 0 2 1 “\\$1" \] 

if \\n(.$>2 \{.ie \we\\$2e@=0 .)L \\n(Lin 0 \\n(Lin-\we@e\0\0.eu 1 “\\$1" 0 1 
el .LB O\\S2 0 2:1 “\$1" 01 \} 

de BL 

nr ;:0 \\n(Pi 

aif \\nG$>0 .if \we\\S1@>0 .ar ;0 O\\S1 

if \\n(.$<2 .LB \\n(;0 0 1 0 \\=(BU 

Af \\n($>1 .LB \\n(;0 0 1 0 \\s(BU O 1 

IT 30 


de DL 

enr 30 \\n(Pi 

if \\n(.$>0 .if \we\\Sle>0 .nr ;0 O\\S1 
if \\n(.$<2 .LB \\n(;0 0 1 0 \(em 

if \\n(.$>1 .LB \\n(G90 0 1 0 \fem 0 1 
rr 30 


de ML 

Af '\\nGS .)D "ML:missing arg” 

nr :0 \w@\\$l@u/3u/\\n(.su+ lu\" get size in n’s 
if !\\n(.$-1 .LB \\n(;0 0 1 0 “\\S1" 

if \\n(.S-1 .if \\n.$-2 .LB O\\S2 0 1 0 “\\S1" 

if \\n(.$-2 .if '\we@\\$2@ .LB \\n(;0 0 1 0 “\\S1" 0 1 
Jif \\n(.$-2 .if \we@\\S2e@ .LB O\\S2 0 10 “\\S1" 0 1 
de RL 

nr 30 6 

if \\n(S>0 .if \we\\Sle>d .nr ;0 O\\S1 

if \\n(.$<2 .LB \\n(;0 0 2 4 

jf \\nGS$>1 .LB \\nG0 024101 

IT 30 


de VL 

Af ‘\\n(.$ .)D "“VL:missing arg” 

Jf !\\n(.$-2 .LB O\\S1 O\\S2 0 0 

if \\n(.$-2 .LB O\\S1 O\\S2 00 \& 0 1 


Any of these can be redefined to produce different behavior: e.g., to provide two spaces between the 
bullet of a bullet item and its text, redefine .BL as follows before invoking it:* 


de BL 
LB 3020 \\*(BU 


21. On this page, @ represents the BEL character, .)D is:an internal PWB/MM macro that prints error messages. and .)L is 
similar to .LB, except that 1 expects its arguments to be scaled. 


22. With this redefinituon, .BL cannot have any arguments. 


oe 


Appendix B: USER-DEFINED LIST STRUCTURES e 
my” =~This appendix is intended only for users accustomed to writing formatter macros. 


If a large docurnent requires complex list structures, it is useful to be able to define the appearance for 
each list level only once, instead of having to define it at the beginning of each list. This permits con- 
sistency of style in a large document. For example, a generalized list-initialization macro might be 
defined in such a way that what it does depends on the list-nesting levei list nesting in effect at the time 
the macro is called. Suppose that levels 1 through S$ of lists are to have the following appearance: 


A. 
[1] 


a) 


+ 


The following code defines a macro (.aL) that always begins a new list and determines the type of list 
according to the current list level. To understand it, you should know that the number register -¢ is 
used by the PWB/MM list macros to determine the current list level; it is 0 if there is no currently active 
list. Each call to a list-initialization macro increments :g, and each .LE call decrements it. 


de aL 

BY register g is used as a local temporary to save :g before it is changed below 
nrg \\nG:g 

if \\nge 0 .AL A \" give me an A. 

if \\ng= 1 .LB \\n(Li 0 1 4 \" give me a [1] 

if \\nge 2 .BL \" give me a bullet 

Af \\ngs 3 .LB \\n(Li 0 2 2 a \" give me an a) 

if \\ng= 4 .ML + \" give me a + 


This macro can be used (in conjunction with .LI and .LE) instead of .AL, .RL, .BL. .LB, and .ML. 
For example, the following input: 


tlet 
first line. 

aL 

LI 

second line. 

LE 

LI 

third line. 

LE ‘ 


will yield: 
A. first line. 
[1] second line. 
B. third line. 


There is another approach to lists that is similar to the .H mechanism. The list-initialization, as well as 
the .LI and the .LE macros are ail included in a single macro. That macro (called .bL below) requires 
an argument to tell it what level of item is required; it adjusts the list level by either beginning a new 
list or setting the list level back to a previous value, and then issues a .LI macro call to produce the 
item: 


a5: 


.de bL 

ie \\n(.$ .nr g \\$S1 \" if there is an argument, that is the level 

-el .nr g \\n(:g \" if no argument, use current level 

if \\ng-\\n(:g>1 .)D "“*ILLEGAL SKIPPING OF LEVEL" \" increasing level by more than | 
if \\ng>\\n(cg \{.aL \\ng-l \" if g > :g, begin new list 

; nr g \\n(:g\} \" and reset g to current level (.aL changes g) 

Jif \\n(:g>\\ng .LC \\ng \" if :g > g, prume back to correct level 

ae if :g = g, stay within current list 

.LI \" in all cases, get out an item 


e¢ 


For .bL to work, the previous definition of the .aL. macro must be changed to obtain the value of g 
from its argument, rather than from -g. Invoking .bL without arguments causes it to stay at the current 
list level. The PWB/MM .LC macro (List Clear) removes list descriptions until the level is less than or 
equal to that of its argument. For example, the .H macro includes the call ‘“.LC Q’’. If text is to be 
resumed at the end of a list, insert the call ‘‘.LC 0’’ to clear out the lists completely. The example 
below illustrates the relatively small amount of input needed by this approach. The input text: 


The quick brown fox jumped over the lazy dog’s back. 
bL 1 

first line. 
.bL 2 
second line. 
.OL 1 

third line. 
.bL 

fourth line. 
LC 0 

fifth line. 


yields: 
The quick brown fox jumped over the lazy dog’s back. . 
A. first line. 
[1] second line. 
B. third line. 


C. fourth line. 
fifth line. 


Appendix C: SAMPLE FOOTNOTES 


The following example illustrates several footnote styles and both labeled and autormatically-numbered foot- 
notes. The actual input for the immediately following text and for the footnotes at the eonom of this page is 
shown on the following page: 


With the footnote style set to the a»roff default, we process a footnote’ followed by another 
one.***** Using the .FD macro, we changed the footnote style to hyphenate, right margin justification, 
indent, and left justify the label. Here is a footnote,* and another.f The footnote style is now set, 
again via the .FD macro,’ to no hyphenation, no mght margin justification, no indentation, and with the 
label left-justified. Here comes the final one.’ 


1. This 1s the first footnote text example (.FD 10). This is the default style for nrog The right margin is vor justified. 
Hyphenation is or permitted. The text is indented. and the automatically generated label is reghtjustified in the text-indent 
space. 

ss*** This is the second footnote text example (.FD 10). This ts also the default nroffstyle but with a long footnote label 
provided by the user. 

2. This is the third footnote example (.FD 1). The right margin is justified. the footnote text is indented. the label is /efi- 
justified in the text-indent space. Aithough not necessarily illustrated by this exampie, hyphenation is permitted. The quick 
brown fox jurnped over the lazy dog's back. 

+ This is the fourth footnote example (.FD 1). The style is the same as the third footnote. 

3. This 1s the fifth footnote exampie (.FD 6). The right margin is vor justified, hyphenation is nor permitted. the footnote text is 

not indented, and the label is placed at the beginning of the first line. The quick brown fox jumped over the lazy dog's back. 

Now is the ime for ail good men to come to the aid of their country. 


BAS 2 


FD 10 

With the footnote style set to the 

I nroff 

default, we process a footnote\=F 

FS 

This is the first footnote text example (.FD 10). 

This is the default style for 

l nroff. 

The right margin is 

1 not 

justified. 

Hyphenation is 

I not 

permiited. 

The text is indented, and the automatically generated label is 

I right -justified 

in the text-indent space. 

FE 

followed by another one.«sees\q (a stands for a space) 
FS Saesa 

This is the second footnote text example (.FD 10). 

This is also the default 

1 nroff 

style but with a long footnote label provided by the user. 

re 
FD | 

Using the .FD macro, we changed the footnote style to hyphenate, mght margin justification, 
indent, and left justify the label. 

Here is a footnote,\«F 

.FS 

This is the third footnote example (.FD 1). 

The right margin is justified, the footnote text is indented, the label is 
1 left -justified 

in the text-indent space. 

Although not necessarily illustrated by this example, hyphenation is permitted. 
The quick brown fox jurnped over the lazy dog’s back. 

FE 

and another.\(dg\c 

FS \(dg 

This is the fourth footnote example (.FD 1). 

The style is the same as the third footnote. 

FE 

FD 6 

The footnote style is now set, again via the .FD macro, to no hyphenation, no right margin justification, 
no indentation, and with the label left-justified. 

Here comes the final one.\«F\o 

.FS 

This is the fifth footnote example (.FD 6). 

The right margin is 

I not 

justified, hyphenation is 

I not 

permitted, the footnote text is 

fl not 

indented, and the label is placed at the beginning of the first line. 
The quick brown fox jumped over the lazy dog's back. 

Now is the time for all good men to come to the aid of their country. 
re 
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Appendix D: SAMPLE LETTER 
ww” The nroff and troff outputs corresponding to the input text below are shown on the following pages. 


.ND “November 1, 1977" 

»TL 334455 

Out-of-Hours Course Description 

.AU "D. W. Stevenson” DWS PY 9876 5432 1X-123 
MT 0 

-DS 

J. M. Jones: 

-.DE 

P 

Please use the following description for the Out-of-Hours course 
"Document Preparation on the PWB/UNIX= 

FS + 

UNIX is a Trademark of Beil Laboratories. 

FE 

time-sharing system”: 

P 

The course is intended for clerks, typists, and others 
who intend to use the PWB/UNIX system 

for preparing documentation. 

The course will cover such topics as: 

VL 18 

.LI Environment: 

utilizing a time-sharing computer system; 

accessing the system; 

using appropriate output terminals. 

.LI Files: 

how text is stored on the system; 

directories; 

manipulating files. 

-LI "Text editing:" 

how to enter text so that subsequent revisions are easier to make; 
how to use the editing system to 

add, delete, and move lines of text; 

how to make corrections. 

.LI "Text processing:" 

basic concepts; 

use of general-purpose formatting packages. 

.LI “Other facilities:” 

additional capabilities useful to the typist such as the ; 
I "typo, spell, diff,” 

and 

I grep 

commands and a desk-calculator package. 

‘Le 


S. P. Lename 
H. O. Del 
M. Hil! 

NE 


Bell Laboratories 


subject: Qut-of-Hours Course Description date: November 1, 1977 
Case: 334455 
from: D. W. Stevenson 
PY 9876 
1X¥-123 x5432 


J. M. Jones: - 


Please use the following description for the Out-of-Hours course 
"Document Preparation on the PWB/UNIX*® timeesharing system": 


The course is intended for clerks, typists, and others who 
intend to use the PWB/UNIX system for preparing documentation. 
The course will cover such topics as: 


Environment: utilizing a time-sharing computer systen; 
accessing the system; using appropriate output 
terminals. 


Files: how text is stored on the system; directories; 
manipulating files. 


Text editing: how to enter text so that subsequent revisions 
are easier to make; how to use the editing sys- 
tem to add, delete, and move lines of text; how 
to make corrections. 


Text processing: basic concepts; use of general-purpose format- 
ting packages. 


Other facilities: additional capabilities useful to the typist 


such as the typo, spell, giff, and grep com- 
mands and a deske-calculator package. 


PY-9876-DWS-jrm D. W. Stevenson 


Copy to 

S. P. Lename 
H. O. Del 
M 


* UNIX is a Trademark of Bell Laboratories. 
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Bell Laboratories 


subject: Out-of-Hours Course Description date. November !, 1977 
Case: 334455 
from: D. W. Stevenson 
PY 9876 
1X-123 x5432 


J. M. Jones: 


Please use the following description for the Out-of-Hours course “Document Preparation on the 
PWB/UNIX* time-sharing system”: 


The course is intended for clerks, typists, and others who intend to use the PWB/UNIX system for 
preparing documentation. The course will cover such topics as: 


° 


Environment: utilizing a time-sharing computer system; accessing the system; using appropriate 
output terminals. 

Files: how text is stored on the system: directories, manipulating files. 

Text editing: how to enter text so that subsequent revisions are easier to make. how to use the 


editing systern to add, delete, and move lines of text; how to make corrections. 
Text processing: basic concepts; use of general-purpose formatting packages. 


Other facilities: additional capabilities useful to the typist such as the mpo, spell. diff, and grep com- 
mands and a desk-calculator package. 


PY -9876-D WS-jrm D. W. Stevenson 


Copy to 

S. P. Lename 
H. O. De! 

M. Hill 


* UNIX is a Trademark of Beil Laboratories. 
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Appendix E: ERROR MESSAGES 


I. PWB/MM Error Messages 


Each PWB/MM error message consists of a standard part followed by a variable part. The standard part is 


of the form: 
ERROR:input line #: 


The variable part consists of a descriptive message, usually beginning with a macro name. The variable 
parts are listed below in alphabetical order by macro name, each with a more complete explanation: 


Check TL, AU, AS, AE,,MT sequence 


AL:bad arg:value 


CS:cover sheet too long 


DS:too many displays 
DS: missing FE 
DS:missing DE 

DE:no DS or DF active 


FE:no FS 
FS:illegal inside TL or AS 


FS: missing FE 
FS:missing DE 
H:bad arg: value 


H: missing FE 

H:missing DE 
H:missing arg 

HU: missing arg 
LB:missing arg(s) 
LB:too many nested lists 


LE:mismatched 


23. This list is set up bv “*.LB 370207 [5.4]. 


The proper sequence of macros for the beginning of a 
memorandum is shown in {6.9}. Something has disturbed this 
order. 


The argument to the .AL macro is not one of 1, A, a, I, ori. 
The incorrect argument is shown as value. 


The text of the cover sheet is too long to fit on one page. The 
abstract should be reduced or the indent of the abstract should 
be decreased {6.4}. 


More than 26 floating displays are active at once, i.e.. have 
been accumulated but not yet output. 


A display starts inside a footnote. The likely cause is the 
omission (or misspelling) of a .FE to end a previous footnote. 


.DS or .DF occurs within a display, t.e., a .DE has been omit- 
ted or mistyped. 


.DE has been encountered but there has not been a previous 
-DS or .DF to match it. 


FE has been encountered with no previous .FS to match it. 


.FS-.FE pair cannot be used inside the memorandum title or 
abstract. 


A previous .FS was not matched by a closing .FE, i.e.. an 
attempt is being made to begin a footnote inside another one. 


A footnote starts inside a display, i.e., a .DS or .DF occurs 
without a matching .DE. 


The first argument to .H must be a single digit from 1 to 7, 
but vaiue has been supplied instead. 


A heading macro (.H or .HU) occurs inside a footnote. 

A heading macro (.H or .HU) occurs inside a display. 

-H needs at least | argument. 

-HU needs | argument. 

.LB requires at least 4 arguments. 

Another list was started when there were already 6 active lists. 


-LE has occurred without a previous .LB or other list- 
initialization macro {5.3.3}. Although this is not a fatal error. 
the message is issued because there almost certainly exists 
some problem in the preceding text. 


Si 


an 


Li:no lists active .LI occurs without a preceding list-initialization macro. The 
latter has probably been omitted, or has been separated from 
the .LI by an intervening .H or .HU. 


ML:missing arg .ML requires at least 1 argument. 
ND:missing arg .ND requires | argument. 


SA:bad arg:value The argument to .SA (if any) must be either 0 or 1. The 
incorrect argument is shown as value. 


SG:missing DE .SG occurs inside a display. 
SG:missing FE .SG occurs inside a footnote. 
SG:no authors .SG occurs without any previous .AU macro(s). 
VL:missing arg .VL requires at least 1 argument. 
II. Formatter Error Messages 


Most messages issued by the formatter are self-explanatory. Those error messages over which the user 
has (some) control are listed below. Any other error messages should be reported to the local systern- 
support group. 


‘Cannot open filename’’ is issued if one of the files in the list of files to be processed cannot be 
opened. If the filename is of the form /usr/lib/tmac.name, then the option -mname specifies an 
incorrect name. If the filename is of the form /usr/lib/term/ name, then the nroff option -Tname 
is incorrect. If the filename is of the form /usr/lib/font/x<x, then the font specified in a forrnatter 
fp request is incorrect. 


‘Exception word list full’’ indicates that too many words have been specified in the hyphenation excep- 
tion list (via .hw requests). 


‘“‘Line overflow’? means that the output line being generated was too long for the formatter’s line 
buffer. The excess was discarded. See the ‘“Word overfiow’’ message below. 


‘‘Out of temp file space’’ means that additional temporary space for macro definitions, diversions, etc. 
cannot be allocated. This message often occurs because of unclosed diversions (missing .FE or 
.DE), unclosed macro definitions (e.g., missing ‘‘..’’), or a huge tabie of contents. 


‘“Foo many page numbers’’ is issued when the list of pages specified to the formatter -o option is too 
long. 


‘“‘Too many string/macro names" is issued when the pool of string and macro names is full. Unneeded 
strings and macros can be deleted using the .rm request. 


‘‘Too many number registers’’ means that the pool of number register names is full. Unneeded regis- 
ters can be deleted by using the .rm request. 


‘“‘Word overflow’? means that a word being generated exceeded the formatter’s word buffer. The 
excess characters were discarded. A likely cause for this and for the ‘“‘Line overflow’’ message 
above are very long lines or words generated through the misuse of \c or of the .cu request, or 
very long equations produced by egn(I)/negn(1). 
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Appendix F: SUMMARY OF MACROS, STRINGS, AND NUMBER REGISTERS 
I. Macros 


The following is an alphabetical list of macro names used by pwB/MM. The first line of each item gives 
the name of the macro, a brief description, and a reference to the section in which the macro is 
described. The second line gives a prototype call of the macro. 


Macros marked with an asterisk are mor, in general, invoked directly by the user. Rather, they are 
‘“user exits’’ called from inside header, footer, or other macros. 


ic One-column processing {11.4} 
AC . 

2C Two-column processing {11.4} 
2c 

AE Abstract end {6.4} 
AE 


AF Alternate format of ‘‘Subject/Date/From””’ block {6.7.2} 
.AF [company-name] 


AL Automatically-incremented list start (§.3.3.1} 
.AL [type] [text-indent] [1] 


AS Abstract start {6.4} 
.AS [arg] [indent] 


AU Author information {6.2} 
.AU name [initials] [loc] [dept] [ext] [room] [arg] [arg] {arg] 


B Boid (underline in nroff) {11.1} 
.B [bold-arg] [previous-font-arg] 


BL Builet list start {5.3.3.2} 
-BL [text-indent] (1] 


cs Cover sheet {10.2} 
.CS [pages] [other] [total] [figs] [tbis] [refs] 


DE Display end {7.1} 
.DE 


DF Dispiay floating start {7.2} 
.DF [format] (fill] 


DL Dash list start {5.3.3.3} 
.DL [text-indent] [1] 


DS Display static start (7.1} 
.DS [format] [fill] 


EC Equation caption {7.5} 
.EC [title] [override] [flag] 


EF Even-page footer [9.6] 


.EF [arg] 

EH Even-page header {9.3} 
.EH [arg] 

EN End equation display {7.4} 
_EN 


EQ Equation display start {7 4} 
EQ 


NE 


OH 


OK 


ae 


Footnote default format {8.3} 
.FD [arg] [1] 


Footnote end {8.2} 


Figure title {7.5} 
.FG [title] [override] [flag] 


Footnote start {8.2} 
.FS [label] 


Heading—numbered {4.2} 
-H level [heading-text] 


Hyphenation character {3.4} 
.HC {hyphenation-indicator] 


Heading mark style (Arabic or Roman numerals, or letters) {4.2.2.5} 
-HM [argi]... {arg7] 


Heading—unnumbered {4.3} 
-HU heading-text 


Heading user exit X (before printing heading) {4.6} 
-HX dievel rlevel heading-text 


Heading user exit Z (after printing heading) {4.6} 
HZ dlevel rievel heading-text 


Italic (underline in nroff) {11.1} _ 

I {italic-arg] [previous-font-arg] 

List begin {5.4} 

.LB text-indent mark-indent pad type [mark] [L]-space] [LB-space] 


List-status clear {Appendix B} 
.LC [list-levei] 


List end {5.3.2} 
LE [1] 


List item {5.3.1} 
.LI [mark] (1] 


Marked list start {5.3.3.4} 
.ML mark ([text-indent] [1] 


Memorandum type [6.6] 
.MT [typel [1] 

New date {6.7.1} 

.ND new-date 


Notation end {6.11.2} 
.NE 


Notation start {6.11.2} 
.NS [arg] 


Odd-page footer {9.7} 
.OF [arg] 


Odd-page header {9.4} 
.OH {arg] 


Other keywords for TM cover sheet {6.5} 
OK [keyword] ... 
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P Paragraph (4.1) 
.P [type] 
PF Page footer {9.5} 
.PF {arg} 
PH Page header (9.2} 
-PH [arg] 
PX *  Page-header user exit {9.12} 
PX i‘ 
R Return to regular (roman) font (end underlining in nroff) {11.1] 
.R 


RL Reference list start {5.3.3.5} 
-RL [text-indent] [1] 


S Set troff point size and vertical spacing {11.3} 
-S larg] | 

SA Set adjustment (right-margin justification) default {11.2} 
SA [arg] 


SG Signature line {6.11.1} 
SG [arg] [1] 


SK Skip pages {11.7} 
SK [pages] 


SP  —- Space — vertically {11.6} 
SP [lines] 


TB Table title {7.5} 
.TB [titie] [override] [flag] 


IC Table of contents {10.1} 
.TC [stevei] [spacing] [tlevei] [tab] [head1] [head2] [head3] [head4] [head5] 


Table end {7.3} 
TE 


™ 


Title of memorandum (6.1} 
.TL (charging-case] [filing-case] 


.TM (number] ... 


* —_ Top-of-page macro {9.12} 
IP 


Table start {7.3} 
TS 


* Table-of-contents user exit {10.1} 
1X 


VL Variable-item list start {5.3.3.6} 
.VL text-indent [mark-indent] [1] 


TE 
TL 
T™ Technical Memorandum number(s) (6.3} 
TP 
TS 
TX 


Il. Strings 


The following is an alphabetical list of string names used by PWB/MM, giving for each a brief descrip- 
- tion, section reference, and initial (default) value(s). See {1.4} for notes on setting and referencing 
strings. 


BU Bullet (3.7} 
nro: @ 
rrof: @ 
F Footnote numberer {8.1} 


nroff \u\\n+ (:p\d 
trogf: \v' -.4m’\s-3\\n+ (:p\s0\v’.4m’ 


DT Date (current date, unless overridden) {6.7.1} 
Month day, year (e.g., October 31, 1977) 


HF Heading font list, up to seven codes for heading levels 1 through 7 (4.2.2.4.1]} 
3322222 (all underlined in nrof, and BBIIIII1 in roff) 


RE SCCS Release and Level of pwaymM {11.3} 
Release.Level (e.g., 12.2) 


Note that if the released-paper style is used, then, in addition to the above strings, certain BTL location 
codes are defined as strings; these location strings are needed only until the .MT macro is called {6.8}. 


II. Number Registers 


This section provides an alphabetical list of register names, giving for each a brief description, section 
reference, initial (default) value, and the legal range of values (where [m:n] means values from m ton 
inclusive). 


Any register having a single-character name can be set from the command line. An asterisk 
attached to a register name indicates that that register can be set ony from the command line or before 
the PWB/MM macro definitions are read by the formatter {2.4, 2.5}. See {1.4} for notes on setting and 
referencing registers. 


A* Has the effect of invoking the .AF macro without an argument (2.4} 
0, [0:1] 
AU Inhibits printing of author’s location, department, room, and extension in the ‘‘from’”’ portion 


of a memorandum (6.2} 
1, [0:1] 


B* Defines table-of-contents and/or cover-sheet macros (2.4} 
0, {0:3] 


C* Copy type (Original, DRAFT, etc.) {2.4} 
0 (Original), [0:3] 


Cl Contents level (i.e., level of headings saved for table of contents) {4.4} 
2, {0:7] 
D* Debug flag {2.4} 
0, [0:1] 
Ds Static display pre- and post-space {7.1} 
1, (0:1) 
Ec Equation counter, used by .EC macro {7.5} 
0, [0:2], incremented by 1 for each .EC call. 
Ej Page-ejection flag for headings {4.2.2.1} 
0 (no eject), [0:7] 
Fg Figure counter, used by .FG macro {7.5] 


0, [0:2], incremented by 1 for each .FG call. 


Fs Footnote space (i.e., spacing between footnotes) (8.4] 
1, (0:2] 
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H1-H7 Heading counters for levels 1-7 {4.2.2.5} 
0, [0:2], incremented by .H of corresponding level or .HU if at level given by register Hu. 
H2-H7 are reset to 0 by any heading at a lower-numbered level. 


Hb Heading break level (after .H and .HU) {4.2.2.2} 
2, [0:7] 


He Heading centering level for .H and .HU (4.2.2.3] 
0 (no centered headings), [0:7] 


Hi Heading temporary indent (after .H and .HU) (4.2.2.2] 
1 (indent as paragraph), [0:2] 


Hs Heading space level (after .H and .HU) {4.2.2.2} 
2 (space only after .H 1 and .H 2), [0:7] 


Ht Heading type (for .H: single or concatenated numbers) {4.2.2.5} 
0 (concatenated numbers: 1.1.1, ete.), [0:1] 


Hu Heading level for unnumbered heading (.HU) {4.3} 
2 (.HU at the same level as .H 2), [0:7] 


Hy Hyphenation control for body of document {3.4} 
1 (automatic hyphenation on), [0:1] 


Lt Length of page {2.4} 
66, [20:2] (11li, (2i:?] in sof’) ** 


Li List indent {5.3.3.1} 
5, [0:7] 

N* Numbering style {2.4} 
0, [0:3] 


O* Offset of page {2.4} 
0, [0:2] (0.5i, (01:2) in sragf)** 


P Page number, managed by pwa/MmM {2.4} 
0, [0:7] 

Pi Paragraph indent {4.1} 
5, {0:2] 

Pt Paragraph type {4.1} 


2 (paragraphs indented except after headings, lists, and displays), [0:2] 
5“ Troff default point size {2.4} 


10, [6:36] 

Si Standard indent for displays {7.1} 
5, [0:7] 

ce Type of nroff output device {2.4} 
0, [0:2] 


Tb Table counter {7.5} 
0. [0:2], incremented by 1 for each .TB call. 


ue Underlining style (nrof) for .H and .HU {2.4} 
0 (continuous underline when possible), [0:1] 


W * Width of page (line and title length) {2.4} 
65. {10:1365] (6.5i, [2i:7.54i] in rrofv)*4 


esd IERIE : 
24 For nroff. these values are unscaled numbers representing lines or character positions: for sroff. these values must be scaled. 
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Cy 


Typing Documents with PwB/MM 
D. W. Smith and.E. M. Piskorik 


Beil Laboratories 
Piscataway, New Jersey 08854 
This guide shows several examples of documents 
prepared with PwWB/MM, a Set of general-purpose for- 
matting macros used with the PwRYUNIX® text for- 
matters nroff and troff (as well as with the egn/neqn 
and (b! programs) to produce memoranda, letters, 
books, manuals, etc. References to manuals for 
these programs are given on p. 16. 


In the examples, input is shown in this 
Helvetica sans serif font. 

The resulting output is shown (boxed) in this 
Times Roman font. 

Substitutable arguments are shown in this 
Times Roman Italic font 


Square brackets (/.../) indicate that the 
enclosed substitutable argument is optionai. 


All output shown in the exampies was done by rroff- 
nrof output would look somewhat different.t 


Contents 
Paragraphs and Headings ........... 2 
Paragraph and Heading Parameters .... 2 
Lists and List Types .............. 4 
Nested Lists... 0... ee eee 5 
Italic, Boid, and Underlining......... 5 
DSA 8 a a te asik Heat Cetera ac Bee alan ss 6 
POOINGIES 6. gi iow Oh es oe eas 6 
Simple Letter—Example ........... 7 
Technical Memorandum—Example .... 9 
Memorandum-Style Macros ......... 1] 
Two-Coiumn Output.............. 13 
EQUAUONS:  4cc00.0 4.4 wads See ea, ee 14 
>| — ea ee 15 
How to Get Output............... 16 
References .........0.c0cccceee 16 


* UNIX ts a Trademark of Bell Laboratories. 


t For exampie, what we call a “blank line’* is a blank line 
in nrof7, but is % of a vertical space in :roff. while head- 
ings that are underlined in nroffare either bold or uaiic 
in roff. 

October 1977 


2 
Paragraphs and Headings 


mw” The output for the following is shown on p. 3. 
4 1 "PARAGRAPHS ANO HEADINGS” 
This section describes the types of paragraonhs 
and the kinds of headings that are availabie. 
.H 2 Paragraphs 
Paragraohs are specified by the .P macro. 
Usually, they are indented exceot 
after headings, lists, and displays. 
The number register Pt is used 
to change the paragraph styie. 
+1 2 Headings. 
4 3 "Numbered Headings." 
There are saven laveis of numbered headings. 
Lavei 1 is the most maior oF highest: 
levei 7, the lowest. 
Pp 
Headings are specified with the . macro, whose 
first argument is the level of heading (1 through 7). 
P 


The apoearance of headings varies according to 

the level. 

On output, level i-headings are preceded by two 
Biank lines; ail others are precaded by one biank line. 
Lavei 1 and level 2 headings produce stand-aione 
headings, underlined in 

J nrott 


Levels 3 through 7 are run-in and underlined (or italic). 
3 “Unnumbered Headings.” 

The macro .HU is a special case 

of .H, in that no heading number is printed. 

Each .HU heading hes the level given by 

the register Hu, whose initial vaiue is 2. 

Usually, the vaiue of that register is 

set to make unnumbered headings (if any) occur 

at the lowest heading ievel in a document 


Paragraph and Heading Parameters 


There are many parameters that can change the out- 
put appearance of headings and paragraphs. Given 
below are some of these parameters, their defaul 
values, and their meanings (level 1 is the mos? mayor 
or Aighesz, while level 7 is the lowesr): 


ne Pi § paragraph-indent in characters (or ens). 
nr Pt O never indent paragraphs. 
Jy Pt 1° always indent paragraphs. 
nr Pt 2 indent paragraphs excepr after 
headings, lists, and displays (defauir). 
Os HF 3322222 
font specification for each 
of the 7 heading levels: 
1 indicates rornan, 
2 indicates italic, 
3 indicates bold. 


ape 


Default Heading Style 


1. PARAGRAPHS AND HEADINGS to get: type: 
This section describes the types of paragraphs and HEAD ° . 
the kinds of headings that are available. a= a ou HEADING 


1.1 Paragraphs 

Paragraphs are specified by the .P macro. Usu- 
ally, they are indented except after headings. 
lists, and dispiavs. The number register Pr is 


mp Heading 4 2 “Heading’ 
Text... Text ... 


used to change the paragraph styie. nin Heading. Text... = de * Heading.” 
1.2 Headings Se eee RN re eee 
1.2.1] Numbered Headings There are seven leveis : : 
of numbered headings. Level 1 is the most Lists and List Types 
major or highest, level 7, the lowest. All lists have a fist begin macro, one or more lis? 
Headings are specified with the .H macro, items—~each consisting of a .LI macro followed by 
whose firsi argument is the level of heading (1 the list em textand the list end macro .LE. That is, 
through 7). lists are typed like this: 
The appearance of headings varies according : . 
to the level. On output, level 1 headings are pre- list begin macro 
ceded by two biank lines. all others are preceded list i 
by one blank line. Level } and level 2 headings (St item text ... 
produce stand-alone headings, underlined in nroff Li 
and bold in of, Levels 3 through 7 are run-in list iwem wxt ... 
and underlined (or italic). : 
1.2.2 Unnumbered Headings The macro HU isa , LE 
jal case of .H, in that no heading number is : 
aa sai. Each HU heacirtg has the!| fet pase where the list begin macro is one of the following: 
the register Hu. whose initial vaiue is 2. Usuaily, AL [type] [indent] automatic list 
the value of that register is set to make unnum- (type is 1, A, a, I, or i: 
bered headings (if any) occur at the lowest head- if omitted, defauits to 1) 
ing level in a document. BL [indent] bullet list 


DL lindent] dash list 
“ML mark [indent] — marked list 
(mark is the desired mark) 


Ri. finden] reference list 
VL indent variable list 
HM1111111 inden’ is the number of characters of indentation 
the above yields an all-numeric Start, if it is optional and omitted, the default inden- 
marking style. Available styles are: tation for the given list styie ts used; mark will 
1, 0001, A, a, I, andi. appear to the left of the indentation. 
ne Hb 2 lowest heading level that is stand-alone a” The output for the following 1s shown on p. 5. 
(i. ¢., nof run-in with the following text). AL 1 
nr He O lowest heading level ana is centered. aa Seiad: 4. aceasta 
nr Hs 2 lowest heading level after which spaderale deo: wa. 
there is a blank line. pads guiilg Set Screw. 
nr Ht O heading marks will be concatenated. 3 fig 
me Hu 2 unnumbered headings (.HU) are (1976). 235-41 
equivalent to numbered headings at this Ll ; 
level for spacing, font, and counting. Nails, #., and Irons, R. 
mr Cl 2 lowest heading level to be saved for Fasteners for Printed Circuit Boards. 
the table of contents. 1 °Proc. ASME” 
nr Ej O lowest heading level that forces the B 123 


start of a new page. oe 23-24. 


1. Pencilpusher, |.. and Hardwired, X. A New 
ee of Set Screw. Proc /EEE 75 (1976), 
-41. 


2 Nails. H. and Irons, R. Fasteners for 
Printed Circuit. Boards. Proc ASME 123 
(1974), 23-24. 


Nested Lists 


This is ordinary text to show 
the margins of the page. 
AL 1 


LJ 
First-levei item. 
AL a 


Li 

Second-ievei item. 

Ld 

Another second-ievel item, Dut 
somewnat longer. 

LE 


u 

Return to pravious list (and to previous vaiue 
of indentation) at this point. 

ae 

Another line. 

LE 


P 
Naw we're cut of the lists and at the margin that 
existed at the deginning of this example. 


This is ordinary text to show the margins of the 
page. 
l. First-level iter. 
a. Second-levei item. 


b. Another second-level item, but some- 
what longer. 


2 Return to previous list (and to previous 
value of indentation) at this point. 


3. Another line. 


Now we're out of the lists and at the margin that 
existed at the beginning of this example. 


Italic, Bold, and Underlining 


In the examples on pp. 4 and 7, the macros .J, .B, 
and .R are used to change (to, respectively, the italic, 
boid, and roman fonts in troff In arog, both J and 
.B cause underlining until the occurrence of .R, 
which turns it off. A single argument given to either 
J or .B results in that argument being underlined by 
nrof, or printed in the corresponding font by troff. 


6 
Displays 


Displays are blocks of text that are to be kept 
together—not split across pages. A Static display 
(.DS) appears'in the same relative position in the 
output text as it does in the input text: this may 
result in extra white space at the bottom of a page if 
a static display is too big to fit there. A floating 
display (.DF), on the other hand, will “float” 
through the input text to the top of the next page if 
there is not enough room for it on the current page: 
thus, the text that follows a floating display in the 
input may precede it in the output. Displays can be 
positioned at the left margin, indented, or centered. 


DS [format] [fil] .DF [format] [fil] 
(ext... ext... 
DE DE 


where formar and fil have the following meanings: 


no indent 
0 no indent 
indent 

center 


Highland Avenue, Mountain Station, 

South Orange, Maplewood, Millburn, Short Hills: 
DS 1 

and now 

for something 

completely different 

DE 

Summit, Chatham, Madison, 

Convent Station, Morristown, New Providence, 
Murray Pill, Berkeiey Heights. 


Highiand Avenue, Mountain Siation. South 
Orange. Maplewood, Millburn, Short Hills: 


and now 
for something 
completely different ! 


Summit, Chatham, Madison, Convent Station. 
Morristown. New Providence, Murray Hill. 
Berkeley Heights. 


Footnotes 


Two styles of footnote marking are shown on p. 7. 
In the first, the asterisk is the mark placed on the 
footnote and the following .FS macro call, while in 
the second, a number is auromancallh generated to 
mark the footnote. The macros .FS and .FE are 
used to delimit the footnote text that is to appear al 
the bottom of the page. 


7 


Among the most important occupants 

of the workbench are the iong-nosed pliers. 
Without this basic toci,+ 

FS + i 

As first shown by Tiger & Leopard (1975). 
FE 


few assemblies couid be completed. 
They may lack the popular\=-F 
FS 


According to Panther & Lion (1977). 
FE 


appeal of the sledgehammer .. 


Among the most tmportant occupants of the 
workbench are the jiong-nosed pliers. Without 
this basic tooi.° few assembiies could be com- 
pleted. They may lack the popular! appeal of the 


siedgehammer ... 


* As first shown by Tiger & Leopard (1975). 
1. According 10 Panther & Lion (1977). 


Simple Letter—Example 


ww” The output for the following is shown on p. &. 
nr Pt oO 
.ND "May 1, 1877° 
TL 


PWB/MM Class 

AU “J. J. Jones” JJJ PY 9999 5001 10-100 

MT ae 

DS 

To All Students: 

.DE 

P 

There will be a class on the document preparation 
facilities of P\WB/MM on November 15-18. 

This class lasts for 4 nalf-day (morning) sessions, 
each consisting of a lecture 

and practice exercises on the system. 

p 


The meeting rooms for the class are: 

OS 1 

ta 150 (mn represents character positions) 
Monday--40-502 (—- indicates a (ab) 
Tuesday-~40-502 

Wednesday 28-639 : 

Thursday 2C-641. 

DE 


P 
Please read the following before attending ciass: 
DL 


Al 

1 “UNIX for Beginners,” 

Sections | and il. 

LJ 

wl 

A Tutorial Introduction to the UNIX Text Editor. 
R 


LE 
(input example continued on the next page) 


" © 
Beil Laboratories 
subject: PWB/MM Class = date. May 1, 1977 
from: J. J. Jones 
PY 9999 
1Q-100 x5001 


To All Students: : 
There will be a class on the document preparation 
faciliies of PWB/MM on November 15-18. This 
Class lasis for 4 half-day (morning) sessions. each | 
consisting of a lecture and praciice exercises on 
the system. 
The meeting roorns for the ciass are: 
Monday 4D-502 
Tuesday 4D-502 
Wednesday 28-639 
Thursday 2C-64]. 
Please read the following before attending class: 
- UNIX for Begmmers. Sections | and Il. 
= A Tutonal Inroducnon to the UNIX Text Eduor. 


These can be obtained from the Computing 
Information Library. 


PY -9999-JJJ-ae J. J. Jones 
Copy to 

G. H. Huriz 

S. P. LeNarme 


(input example continued from the previous pare) 


These can be obtained from the Computing 
intormation Library. 
3G ae 


S. P. LeName 
NE 
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Technical Memorandum—Exampile _ 


me The output for the following is shown on pp. 10-12. 
ne Pt 1 
NO ‘June 29, 1977° 
.TL 12345 666666 
On Constructing a Table of Ail 
Even Prime Numbers: 
AU °S. P. LeName’® SPL PY 9999 4000 1Z-123 
.AU °G. H. Hurtz* GHH PY 9999 4001 12-121 
.TM 786543210 
AS 
iP 
This is an abstract for a technical memorandum. 
The abstract will appear on the cover 
sheet and on the first page 
. uniess 
the macro .AS has an argument of 1, in which case 
the aostract will be printed only on the cover sheet. 
The TM number agopears on the cover sheet 
and on (he first page. 
“Other Keywords” appear only on the cover sheet. 
p 


The abstract may consist of one or more paragraons; 
it must fit on the cover sheet. 


AE 
.OK *Prime Numbers’ Even 
MT 


41 “INTRODUCTORY MATERIAL’ 

The first Sine of the dody of the memorandum 
immediately follows the macro cail for 

the heading (+). 

Alternately, lower-level heading macros may follow it, 
as weil as macros for lists, paragraohs, and 80 On. 
A briet example of a list follows: 

AL A 

5 

This is the first item in an alphabetical 

list in the body of this memorandum. 

Ll 


This is the second item in the list. 
AL 1 


LU 

This is the first item in a (numbered) sub-list. 
Ld 

This is the second itam in that sud-list. 

LE 


LE 

P 

This is the second paragraph under the first heading. 

in addition to aipfabetized and numbered lists, there 

are bullet lists, dash lists, variable lists, etc. 

.H 2 “First Second-Lavel Heading’ 

This is the first paragraph under a 

second-level heading. 

Notice how that heading is numbered and 

where the heading and text are printed. 

1 “SECOND FIRST-LEVEL HEADING’ 

This is the first paragraph under the 

second first-level heading of the memorandum. 
(input example continued on the next page) 


©) 


Beli Laboratories 
: On Coastructiag s date: June 29, 1977 
Table of Ail Even 
Prime Numbers fron: $. P LeName 
FY 9999 
1Z-123 24000 


Case: 12345 
File: 666666 


1Z-121 24001! 
T™: 76543210 


ABSTRACT 


This is an abstract for a technical 
_Memorandum. The sbdstract will appear on 
the cover sheet and on the frst page uniess 
the macro .AS has an argument of |, in 
which case the abstract will Se printed oniv 
on the cover sheet. The TM number appears 
on the cover sheet and on the first page. 
“Other Keywords’ appear oniy on the cover 


The abstract may consist of one or more 
paragraphs, it must §t on the cover sect 


MEMORANDUM FOR FILE 


1. INTRODUCTORY MATERIAL 

The first line of the body of the memorandum 
immediately follows the macro cail for the heading (.H). 
Alternately, lower-level heading macros muy follow 


(inpur exampie contirrued from the previous page) 
HU REFERENCES 
RL 
J 
Pencilousher, |.. and Hardwired, X. 
A New Kind of Set Scraw. 
J “Proc. IEEE 
8 75 
(1978), 235-41. 


ele$ 

Nails, H., and trons, RP. 

Fasteners for Printed Circuit Boards. 
J "Proc. ASME” 

8 123 

(1974), 23-24. 


55 Bh 
anes 


ae 
it, as weil as macros for lists, paragraphs, and so on. A 
brief example of a list follows: : 


A. This is the first item in an alphabetical list in the 
body of this memorandum. 


B. This is the second item in the list. 
1. This is the first item in a (numbered) sub-list 
2. This 1s the second iter in that subd-list 
This is the second paragraph under the first heading. 
in addition to aiphabetized and numbered lists, there are 
bullet Issts, dash lists, variable lists, etc. 
1.1 First Second-Levei Heading 
This is the first paragraph under a second-level head- 
ing. Notice how that heading is numbered and where 
the heading and text are printed. 


2. SECOND FIRST-LEVEL HEADING 


This is the first paragraph under the second first-level! 
heading of the memorandum. 


REFERENCES 


[1} Penciipusher, 1. and Hardwired, X. A New Kind of 
Set Screw. Proc. JEEE 75 (1976), 235-41. 


[2] Nails. H.. and Irons, R. Fasteners for Printed Circuit 
Boards. Proc. ASME 123 (1974), 23-24. 


S. P. LeNeme 


PY -9999-SPL/GHH-rfg 
Alt. 

Copy (without att.) to 
G. B. Brown 

C. P. Jones 

J. J. Semith 


G. H. Hartz 


Memorandum-Style Macros 


Macros for a memorandum-style document must be 
invoked in the order shown on pp. 9-10. Once the 
‘memorandum type’ (.MT) macro has been 
invoked, none of the macros that precede it can be 
used. The .MT macro controls the format of the 
‘subject, date, from’ portion of the first page of the 
memorandum. Different arguments to the .MT 
macro will produce different kinds of memoranda: 


| Code Meaning 


“ no memorandum type is printed 
MT 0 no memorandum type is printed 
MT MEMORANDUM FOR FILE 
.MT ] MEMORANDUM FOR FILE 
.MT 2 PROGRAMMER'’S NOTES 
MT 3 ENGINEER’S NOTES 
Released-Paper style 

External Letter 


©) se radornris 


eee re as AZ IRE OED EE ITE IAN TIRE TEE EEL ELSE LEE LLL EEL ELLE DELI, 
The information contained herein... not for publication ... 
ARERR BSCE TANGO NS NEE EEG SA ITO PETES TOE SEIS TITTLE IIS PET IATIE SE PELE ERIS ECNE NL EBSA LEE LAE EL IL LIGE LLL AEA LL ELLIE I 


Cover Sheet for TM 


Title: On Coastructing a Table of Dee June 29, 1977 
All Even Prime Numbers 
™: 76543210 


Ouner Keywords. Prime Numbers 
Eves 


Author(s) Lacauon Ext. Charging Case: 12545 
S. P. LeName PY 1Z-123 4000 Fikng Case: 666666 
G. H. Hurtz = PY 1Z-121 4001 


ABSTRACT 


This is an abstract for a technical 
memorandum. The abstract will appear on 
the cover sheet and on the first page unless 
the macro .AS has an argument of 1, in 
which case the abstract will be printed only 
on the cover sheet. The TM number appears 

_ on the cover sheet and on the firs! page. 
“Other Keywords” appear only on the cover 
sheel. 


The abstract may consist of one or more 
paragraphs; it must fit on the cover sheet. 


Pages Text: 2 Other: 1 Total: 3 
No. Figures: 0 No. Tables: 0 No. Refs.: 2 


Z-0000-X SEE REVERSE SIDE FOR DISTRIBUTION LIST 


The input and the resulting output for a simpie 
letter are shown on pp. 7-8. Note that the TM, 
AS/.AE, and .OK macros are nor used there, and 
that the .MT macro has a nu// argument (°°). Docu- 
ments of the type shown on pages 2-3 (essentially 
plain text) are produced by omitting, as weil, the 
other ‘‘memorandum-style’” macros: .ND, .TL, .AU, 
and .MT at the beginning of the docurnent, and SG, 
.NS/.NE, and .CS at the end. 


Like the .MT macro, the notation macro (.NS) may 
also take different arguments to produce a variety of 
notations following the signature line: 


If the .CS macro is included in the input file (see last 
line of p. 10) and if the -82 option is included on 
the command line (see p. 16), a cover sheet is gen- 
erated (see p. 12). (The 6 arguments to .CS are the 


Copy to 

Copy to 

Copy to 

Copy (with att.) to 
Copy (without att.) to 


Letter to 
Memorandum to 


data for the bottom of the TM cover sheet: ‘*Pages 
Text,’’ ‘“‘Other,”” ete.) Similarity, the .TC macro, 
together with the -r81 or -rB3 option (see p. 16) 
generates a table of contents, .CS and .TC can occur 
only at the end of a document. 


Two-Column Output 


OS 2 
The Deciaration of independence 
DOE 
.2C 


Pp 


When in the Course of human events, it becomes 
necessary for one people to dissolve the political 


bands which have connected them with another, and 


to assume among the powers of the serth, the 
seoarate and equal station to which tne Laws of 
Nature and of Nature's God entitle them, a decent 


resoect to the opinions of mankind requires that they 


should deciare the causes which impei them to the 
secaeration. 


Pp 


We hold these truths to be self-evident, that ail men 
are created equai, ... 


When in the Course of 
human events, it becomes 
necessary for one peopie 
to dissolve the political 
bands which have con- 
nected them with another, 
and lo assume among the 
powers of the earth, the 
separate and equal station 
to which the Laws of 
Nature and of Nature's 


The Deciaration of Independence 


Ged entitle 


them, a 
decent respect to the opin- 
ions of mankind requires 
that they should deciare 
the causes which impe! 
them to the separation. 


We hold these truths 
to be self-evident, that all 
mien are created equal. ... 
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Equations 
A stand-alone equation is built within a display. 
DS 2 
£Q 


a a a 
DE 


coi | A{11) above . 

} . above .} col |. atove . 
adove A(33) } } right | times left { pile { aiona 
above beta above gamma | right | 


; . (xip 
. A (33) E 
In-line equations may appear in running text if a 
character has been defined to mark the left and right 
ends of the equation. Normally, § is used as that 


character and is so defined by typing the following 
three lines at the beginning of the document: 


delim $$ 
EN 


The quantities $a dot$, $d dotdot$, $xi tiide times 
y vec$S are the vaiues that show .._. 


The quantities a, b, ExF are the values that show ... 


This facility can be used for preparing text that con- 
tains subscripts and superscripts: 


The quantity $ a su6j sup 3 Sis . 


The quantity a, is ... 


For more examples, see p. 15 and Reference 4. 


15 
Tables 


The meanings of the key-letters describing the align- 
ment of each entry are: 


c center fm numerical 
r =o ight-adjust @ alphabetic subcolumn 
| = left-adjust S spanned 


Globai tabie options are center, expand, box, allbox, 
doublebox. and waé (x). 


DS 

TS 

ailbox : 

mae 
ee 


(4/033 328 


AT&T Common Stock 
Year—Price—Dividend 


1973-—-46-55— 2.87 | 5445-52 3.40 
4—40-53—3.24 

§—45-52—3.40 BUDE 
6—51-59—.95¢ isi quarter only) 
TE 


: (first quarter only) 
DE 


{(— indicates a tab) 


Name—Definition 


ee 


45 

Sine-—-$sin (x) = 1 over 2j ( e sup jx - e sup -jx )$ 
Zeta—Szeta (s) = \ 

sum from k=1 to inf k sup -s “~"{ Re’s >1)$ 

TE 

DE 


_ Defininon 


Sine sin(x)=-(e2—¢-#) 
2) 


Zea C(s)™ km! (Re s>1) 


ko 


For more examples, see Reference 3. 


oe 
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How to Get Output 


Documents with text only: 
nrof: mm (options! files 
or nroff foprons/ -mm files 
voff; troff fopuons] -mm files 
Text and equations: 
nrof: mm -e /options/ files 
or neqn files | nroff /options/ -mm - 
wo eqn files | troff /opuons/ -mm - 
Text and tables: 
nrof; mm -t fopuons] files 
or tbi files { oroff -mm fopnons/ - 
roff; bi files | troff -mm /options/ - 
Text, tables, and equations: 
nrof: mm -t -e /oprons/ files 
or tdi files | negn | nroff /options/ -mm - 
of: tbl files | eqn | troff /opuons/ -mm - 


The following options may be specified on the above 
PWB/UNIX shell command lines: 

-Ok,m-n print only page Ak, and pages m through 2 

-7B1 include macros for the table of contents. 

-fB2 include macros for the cover sheet. 

-fB3 include macros for both. 

C1 OFFICIAL FILE COPY in footer. 

-fC2 DATE FILE COPY in footer. 

C3 DRAFT in footer. 

efLa set page length to a lines. 

-fN1 page header at bortorn of first page only. 

-fN2 no page number on firs page. 

-fN3 section-page numbering. 

-rOn set page offset to n characters.” 

-rWr set line width to a characters.* 
Terminal type and/or pitch are usually indicated by 
the -ho, -ti, -450, -300S, and/or -12 options of the 
mm(I) command, if it is used (see Reference 6): 
otherwise, they are specified by one of the nroff 
-T name options. 


References 
1. PwamM—Programmer’s Workbench Memorandum 
Macros by D. W. Smith and J. R. Mashey. 
2. A Tutorial Introduction to the Unix Text Editor by 
B. W. Kernighan. 
3. TbI—A Program to Format Tables by M. E. Lesk. 


4. Typeserting Mathematics — User’s Guide (Second 
Edition) by B. W. Kernighan and L. L. Cherry. 


5. NroFrrrrorF User’s Manual by J. F. Ossanna. 


6. Pwerunix User's Manual—Edition 1.0 by T. A. 
Dolotta, R. C. Haight, and E. M. Piskorik, eds. 


* For aroff, a musi be an unscaled number representing 
lines or character positions. For off. n must be scaled. 


The PwB/UNIX* document entitled: 
PwaimMM Tutorial 
is not yet available. 


UNIX is a Trademark/Service Mark of the Bell System. 


T.4 


Tbi is a document formatting preprocessor for troff or nroff which makes 
even fairly complex tables easy to specify and enter. It is available on the PDp- 
11 UNIX* system and on Honeywell 6000 Gcos. Tables are made up of columns 
which may be independently centered, right-adjusted, left-adjusted, or aligned 
by decimal points. Headings may be placed over single columns or groups of 
columns. A table entry may contain equations, or may consist of several rows 
of text. Horizontal or vertical lines may be drawn as desired in the table, and 


Tbl — A Program to Format Tables 


M. E. Lesk 


Bell Laboratories 
Murray Hill, New Jersey 07974 


ABSTRACT 


any table or element may be enclosed in a box. For example: 


September 4, 1977 


1970 Federal Budget Transfers 
(in billions of doilars) 


aarp wy ene emeenete—tieaneanene meena 


New York 
New Jersey 
Connecticut 
Maine 
California 
New Mexico 
Georgia 
Mississippi 
Texas 


* UNIX is a Trademark/Service Mark of the Beil System 
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Tbi — A Program to Format Tables 


M. E. Lesk 


Beil Laboratories 
Murray Hill, New Jersey 07974 


Introduction. 


Tbl turns a simple description of a table into a troff or nroff [1] program (list of com- 
mands) that prints the table. 75/ may be used on the ppp-11 UNIX [2] system and on the 
Honeywell 6000 GCOS system. It attempts to isolate a portion of a job that it can successfully 
handle and leave the remainder for other programs. Thus tb/ may be used with the equation 
formatting program eqn [3] or various layout macro packages (4,5,6], but does not duplicate 
their functions. 


This memorandum is divided into two parts. First we give the rules for preparing /d/ 
input; then some examples are shown. The description of rules is precise but technical, and the 
beginning user may prefer to read the examples first, as they show some common table 
arrangements. A section explaining how to invoke td/ precedes the examples. To avoid repeti- 
tion, henceforth read troffas ‘‘troffor nroff.”’ 


The input to ¢d/ is text for a document, with tables preceded by a ‘‘.TS’’ (table start) 
command and followed by a ‘‘. TE” (table end) command. 70/ processes the tables, generating 
troff formatting commands, and leaves the remainder of the text unchanged. The ‘*.TS’’ and 
‘““ TE” lines are copied, too, so that trogf page layout macros (such as the memo formatting 
macros {4]) can use these lines to delimit and place tables as they see fit. In particular, any 
arguments on the ‘*.TS”’ or ‘“*. TE” lines are copied but otherwise ignored, and may be used by 
document layout macro commands. 


The format of the input is as follows: 


text 
.TS 
table 
ste 
text 
.TS 
table 
~TE 
text 


where the format of each table is as follows: 


IS 
options ; 
format . 
data 
~TE 


Each table is independent, and must contain formatting information followed by the data to be 
entered in the table. The formatting information, which describes the individual columns and 
rows of the table, may be preceded by a few options that affect the entire table. A detailed 
description of tables is given in the next section. 


Input commands. 


As indicated above, a table contains, first, global options, then a format section describing 


the layout of the table entries, and then the data to be printed. The format and data are always 
required, but not the options. The various parts of tre table are entered as follows: 


1) 


2) 


OPTIONS. There may be a single line of options affecting the whole table. If present, this 
line must follow the .TS line immediately and must contain a list of option names 
separated by spaces, tabs, or commas, and must be terminated by a semicolon. The 
allowable options are‘. 


center — center the table (default is left-adjust); 

expand — make the table as wide as the current line length; 
box — enclose the table in a box; 

allbox — enclose each item in the table in a box; 
doublebox — enclose the table in two boxes; 

tab (x) — use x instead of tab to separate data items. 


The té/ program tries to keep boxed tables on one page by issuing appropriate ‘‘need’’ 
(.ne) commands. These requests are calculated from the number of lines in the tables, 
and if there are spacing commands embedded in the input, these requests may be inaccu- 
rate; use normal troff procedures, such as keep-release macros, in that case. The user who 
must have a multi-page boxed table should use macros designed for this purpose, as 
explained below under ‘Usage.’ 


FORMAT. The format section of the table specifies the layout of the columns. Each line 
in this section corresponds to one line of the table (except that the last line corresponds to 
all following lines up to the next .T&, if any ~ see below), and each line contains a key- 
letter for each column of the table. It is good practice to separate the key letters for each 
column by spaces or tabs. Each key-letter is one of the following: 


Lor 1_ to indicate a left-adjusted column entry; 
Rorr_ to indicate a right-adjusted column entry; 
Core to indicate a centered column entry; 


Norn to indicate a numerical column entry, to be aligned with other numerical 
entries so that the units digits of numbers line up; 


Aora_ to indicate an alphabetic subcolumn; all corresponding entries are aligned on 
the left, and positioned so that the widest is centered within the column (see 
example on page 12); 


Sor s_ to indicate a spanned ‘heading, i.e. to indicate that the entry from the previous 
column continues across this column (not allowed for the first column, obvi- 
, ously); or 


“ to indicate a vertically spanned heading, i.e. to indicate that the entry from the 
previous row continues down through this row. (Not allowed for the first row 
of the table, obviously). 


When numerical alignment is specified, a location for the decimal point is sought. The 
rightmost dot (.) adjacent to a digit is used as a decimal point; if there is no dot adjoining 
a digit, the rightmost digit is used as a units digit; if no alignment is indicated, the item is 
centered in the column. However, the special non-printing character string \& may be 
used to override unconditionally dots and digits, or to align alphabetic data; this string 
lines up where a dot normally would, and then disappears from the final output. In the 
example below, the items shown at the left will be aligned (in a numerical column) as 
shown on the right: . 


13 13 

4.2 4.2 
26.4.12 26.4.12 
abc abc 
abc\& abc 
43\&3.22 433.22 
749.12 749.12 


Note: If numerical data are used in the same column with wider L or r type table entries, 
the widest number is centered relative to the wider L or r items (L is used instead of 1 for 
readability; they have the same meaning as key-letters). Alignment within the numerical 
items is preserved. This is similar to the behavior of a type data, as explained above. 
However, alphabetic subcolumns (requested by the a key-letter) are always slightly 
indented relative to L items; if necessary, the column width is increased to force this. 
This is not true for n type entries. 


Warning: the n and a items should not be used in the same column. 


For readability, the key-letters describing each column should be separated by spaces. 
The end of the format section is indicated by a period. The layout of the key-letters in 
the format section resembles the layout of the actual data in the table. Thus a simple for- 
mat might appear as: 

ess 

Inn. 
which specifies a table of three columns. The first line of the table contains a heading cen- 
tered across all three columns; each remaining line contains a left-adjusted item in the 
first colurnn followed by two columns of numerical data. A sample table in this format 
might be: 


Overall title 


Item-a 34.22 9.1 
Item-b 12.65 .02 
Items: c,d,e 23 5.8 
Total 69.87 14.92 


There are some additional features of the key-letter system: 


Horizontal lines — A key-letter may be replaced by ‘_’ (underscore) to indicate a hor- 
izontal line in place of the corresponding column entry, or by *=’ to indicate a dou- 
ble horizontal line. If any data entry is provided for this column, it is ignored and a 
warning message is printed. 

Vertical lines — A vertical bar may be placed between column key-letters. This will 
cause a vertical line between the corresponding columns of the table. A vertical bar 
to the left of the first key-letter or to the right of the last one produces a line at the 
edge of the table. If two vertical bars appear between key-letters, a double vertical 
line is drawn. 


Space between columns — A number may follow the key-letter. This indicates the 
amount of separation between this column and the next column. The number nor- 
mally specifies the separation in ens (one en is about the width of the letter ‘n’).* If 
the “‘expand’’ option is used, then these numbers are multiplied by a constant such 
that the table is as wide as the current line length. The default column separation 
number is 3. If the separation is changed the worst case (largest space requested) 
governs. 


* More precisely, an en is a number of points (1 point = 1/72 inch) equal to half the current type size. 


waz 


Vertical spanning -— Normally, vertically spanned items extending over several rows of 
the table are centered in their vertical range. If a key-letter is followed by t or T, 
any corresponding vertically spanned item will begin at the top line of its range. 


Font changes — A key-letter may be followe7+ by a string containing a font name or 
number preceded by the letter f or F. This indicates that the corresponding column 
should be in a different font from the default font (usually Roman). All font names 
are one or two letters; a one-letter font name should be separated from whatever 
follows by a space or tab. The single letters B, b, I, and i are shorter synonyms for 
f{B and fI. Font change commands given with the table entries override these 
specifications. 


Point size changes — A key-letter may be followed by the letter p or P and a number to 
indicate the point size of the corresponding table entries. The number may be a 
signed digit, in which case it is taken as an increment or decrement from the current, 
point size. [f both a point size and a column separation value are given, one or 
more blanks must separate them. 


Column width indication ~ A key-letter may be followed by the letter w or W and a width 
value in parentheses. This width is used as a minimum column width. If the largest 
element in the column is not as wide as the width value given after the w, the larg- 
est element is assumed to be that wide. If the largest element in the column is 
wider than the specified value, its width is used. The width is also used as a default 

- line length for included text blocks. Normal troff units can be used to scale the 
width value; if none are used, the default is ens. If the width specification is a unit- 
less integer the parentheses may be omitted. If the width value is changed in a 
column, the /ast one given controls. 


Equai width columns — A key-letter may be followed by the letter e or E to indicate 
equal width columns. All columns whose key-letters are followed by e or E are 
made the same width. This permits the user to get a group of regularly spaced 
columns. 


Note: The order of the above features is immaterial; they need not be separated by 
spaces, except as indicated above to avoid ambiguities involving point size and font 
changes. Thus a numerical column entry in italic font and 12 point type with a 
minimum width of 2.5 inches and separated by 6 ens from the next column could be 
specified as 

npl2w(2.5ifI 6 


Alternative notation — Instead of listing the format of successive lines of a table on con- 
secutive lines of the format section, successive line formats may be given on the 
same line, separated by commas, so that the format for the example above might 
have been written: 

css,lnn. 


Default — Column descriptors missing from the end of a format line are assumed to be 
L. The longest line in the format section, however, defines the number of columns 
in the table, extra columns in the data are ignored silently. 


DATA. The data for the table are typed after the format. Normaily, each table line is 
typed as one line of data. Very long input lines can be broken: any line whose last charac- 
ter is \ is combined with the following line (and the \ vanishes). The data for different 
columns (the table entries) are separated by tabs, or by whatever character has been 
specified in the option tabs option. There are a few special cases: 


Troff commands within tables ~ An input line beginning with a ‘.’ followed by anything 
but a number is assumed to be a command to troffand is passed through unchanged, 
retaining its position in the table. So, for example, space within a table may be pro- 
duced by ‘*.sp’’ commands in the data. 


ae 


Full width horizontal lines — An input line containing only the character _ (underscore) or 
== (equal sign) is taken to be a single or double line, respectively, extending the full 
width of the tadle. 


Single column horizontal lines — An input table entry ccntaining only the character _ or = 
is taken to be a single or double line extending the full width of the column. To 
obtain these characters explicitly in a column, either precede them by \& or follow 
them by a space before the usual tab or newline. 


Vertically spanned items — An input table entry containing only the character string \* 
indicates that the table entry immediately above spans downward over this row. It is 
equivdlent to a table format key-letter of ‘~’. 


Text blocks — In order to include a block of text as a table entry, precede it by T{ and 
follow it by T}. Thus the sequence 
sw Tl 
block of 
text | 
ere | 
is the way to enter, as a single entry in the table, something that cannot con- 
veniently be typed as a simple string between tabs. Note that the T} end delimiter 
must begin a line; additional columns of data may follow after a tab on the same 
line. See the example on page 10 for an illustration of included text blocks in a 
table. If more than twenty or thirty text blocks are used in a table, various limits in 
the troff program are likely to be exceeded, producing diagnostics such as ‘too many 
string/macro names’ or ‘too many number registers.’ 


Text blocks are pulled out from the table, processed separately by troff and replaced 
in the table as a solid block. If no line length is specified in the dfock of text itself, 
or in the table format, the default is to use LxC/(N+1) where L is the current line 
length, C is the number of table columns spanned by the text, and JN is the total 
number of columns in the table. The other parameters (point size, font, etc.) used 
in setting the block of text are those in effect at the beginning of the table (including 
the effect of the ‘“‘.TS’’ macro) and any table format specifications of size and font, 
using the p and f modifiers to the column key-letters. Commands within the text 
block itself are also recognized, of course. However, trof commands within the 
table data but not within the text block do not affect that block. 


Warnings: — Although any number of lines may be present in a table, only the first 200 
lines are used in calculating the widths of the various columns. A multi-page table, 
of course, may be arranged as several single-page tables if this proves to be a prob- 
lem. Other difficulties with formatting may arise because, in the calculation of 
column widths all table entries are assumed to be in the font and size being used 
when the ‘*.TS’’ command was encountered, except for font and size changes indi- 
cated (a) in the table format section and (b) within the table data (as in the entry 
\s+3\fldata\fP\sO). Therefore, although arbitrary sroffrequests may be sprinkled in 
a table, care must be taken to avoid confusing the width calculations; use requests 
such as ‘.ps’ with care. 


4) ADDITIONAL COMMAND LINES. If the format of a table must be changed after many simi- 
lar lines, as with sub-headings or summarizations, the ‘‘.T&’’ (table continue) command 
can be used to change column parameters. The outline of such a table input is: 


.TS 
options ; 
format . 
data 
.T& 
format . 
data 
T& 
format . 
data 
TE 


as in the examples on pages 9 and 12. Using this procedure, each table line can be close 
to its corresponding format line. 
Warning: it is not possible to change the number of columns, the space between columns, 


the global options such as box, or the selection of columns to be made equal width. 


~ 


Usage. 
On UNIX, (6/can be run on a simple table with the command 


tbl input-file | troff 


but for more complicated use, where there are several input files, and they contain equations 
and ms memorandum layout commands as well as tables, the normal command would be 


tbl file-1 file-2.. . |eqn|troff —ms 


and, of course, the usual options may be used on the troffand eqn commands. The usage for 
nroff is similar to that for trof; but only TELETYPE® Model 37 and Diablo-mechanism (DAS! or 
GSI) terminals can print boxed tables. 


Note that when egn and (té/ are used together on the same file ¢d/ should be used first. If 
there are no equations within tables, either order works, but it is usually faster to run ¢/ first, 
since eqn normally produces a larger expansion of the input than ¢d/. However, if there are 
equations within tables (using the delim mechanism in eqn), tb/ must be first or the output will 
be scrambled. Users must also beware of using equations in n-style columns; this is nearly 
always wrong, since ‘b/ attempts to split numerical format items into two parts and this is not 
possible with equations. 


Toi limits tables to twenty columns; however, use of more than 16 numerical columns 
may fail because of limits in troff producing the ‘too many number registers’ message. Troff 
number registers used by ‘d/ must be avoided by the user within tables; these include two-digit 
names from 31 to 99, and names of the forms #x, x+, x| +x, and x-, where x is any lower 
case letter. The names ##, #—, and #° are also used in certain circumstances. To conserve 
number register names, the n and a formats share a register; hence the restriction above that 
they may not be used in the same column. 


For aid in writing layout macros, ‘b/ defines a number register TW which is the table 
width, it is defined by the time that the ‘‘.TE’’ macro is invoked and may be used in the 
expansion of that macro. More importantly, to assist in laying out multi-page boxed tables the 
macro T# is defined to produce the bottom lines and side lines of a boxed table, and then 
invoked at its end. By use of this macro in the page footer a multi-page table can be boxed. In 
particular, the ms macros can be used to print a multi-page boxed table with a repeated heading 
by giving the argument H to the **. TS” macro. If the table start macro is written 

.ISH 
a line of the form 

<TH 
must be given in the table after any table heading (or at the start if none). Materia! up to the 


a 


‘““.TH”’ is placed at the top of each page of table; the remaining lines in the table are placed on 
several pages as required. Note that this is nota feature of ¢d/, but of the ms layout macros. 


Examples. 


Here are some examples illustrating features of tb/. The symbol @ in the input 
represents a tab character. 


Input: Output: 
.IS Language Authors Runs on 
box; 
ccc Fortran Many _ Almost anything 
111. PL/1 IBM : 360/370 
Language © Authors @ Runs on C BTL 11/45,H6000, 370 

BLISS Carnegie-Mellon PDP-10,11 

Fortran © Many ® Almost anything IDS Honeywell H6000 
PL/1 @IBM © 360/370 Pascal Stanford 370 


C@BTL @11/45,H6000,370 
BLISS ®Carnegie-Mellon ® PDP-10,11 


IDS ® Honeyweil ® H6000 
Pascal ® Stanford ® 370 
TE 

Input: Output: | 
.TS AT&T Common Stock 
aeons 
: : : 1971 | 41-54 | $2.60 
ii 41-54 
AT&T Common Stock 
Year ® Price ® Dividend 


1971 © 41-54 $2. 60 
2041-54 @2. 70 
3046-55 © 2. 87 
4040-53 3. 24 

5 045-52 D3. 40 
6@51-59@.95* 

TE 

* (first quarter only) 


[6 [3-59 | 9 


* (first quarter only) 


Input: Output: 
.TS Major New York Bridges Z 
box: 
7 i , Brooxiyn J. A. Roebling 1595 
a Manhattan G. Lindenthal | 1470 
Major New York Bridges Williamsburg L. L. Buck 1600 


Queensborough Palmer & 1182 
' Hornbostel 
1380 
383 


Bronx Whitestone O. H. Ammann 2300 
Throgs Neck O. H. Ammann 1800 


3500 


Bridge ® Designer ® Length 


Brooklyn @J. A. Roebling ® 1595 
Manhattan @G. Lindenthal © 1470 
Williamsburg OL. L. Buck @ 1600 


Queensborough ® Palmer & @ 1182 
® Hornbostel 


® ©1380 
Triborough OO. H. Ammann ®_ 
®D 0383 


Bronx Whitestone ®O. H. Ammann ©2300 
Throgs Neck ®O. H. Ammann ® 1800 


George Washington ®O. H. Ammann ® 3500 
sTE 


Input: Output: 


.TS 

cc 
np-2|n|. 
@ Stack 


nf WwW HN we 


iS On Oe 

january Ofebruary ® march 
april@may °. 

june @july © Months 

august @ september 

october D november © december 
TE 


Input: 


.TS 

box; 

cfBsss. 
Composition of Foods 


T& 

ciless 

c jess 

c |e lec lc. 

Food @ Percent by Weight 
V@_ 

\" ® Protein @ Fat ® Carbo- 
\"@\" @\* Ohydrate 


T& 

I {n [na fn. 

Apples @.40.5@13.0 
Halibut @18.4@05.2@... 
Lima beans ®7.5@.38@22.0 
Milk 03.304.0@5.0 
Mushrooms 03.5 @ .4@6.0 
Rye bread @9.0@ .6@52.7 
TE 


february march 
may 


july 
september 
november december 


Output: 


Composition of Foods 


| Percent by Weight by ee. 
Food Carbo 
Apples 5 
Halibut 
Lima beans 
Milk 
Mushrooms 
Rye bread 
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Input: Output: 
.TS New York Area Rocks 
allbox: | Era Age (Wears) 


S ewli) ew(li Precambrian 
Ip9 Ip9 Ip9. 
New York Area Rocks Newark Basin, 200 miilion 

Era ® Formation @ Age (years) incl. Stockton, 

Precambrian ® Reading Prong ® > 1 billion ans be 

Paleozoic ® Manhattan Prong ©@400 million 


mations; also 


Mesozoic OT{ Watchungs and 

Na —— Palisades. 

Newark Basin, incl. Cenozoic Coastal Plain On Long Island 
Stockton, Lockatong, and Brunswick 30,000 years: 
formations; also Watchungs Cretaceous sedi- 
and Palisades. ments redepo- 
T}@200 million sited by recent 


Cenozoic ® Coastal Plain @T{ glaciation. 
On Long Island 30,000 years; 

Cretaceous sediments redeposited 
by recent glaciation. 


.ad 

T} 

TE 

Input: Output: 

-EQ Definition 

delim $$ 

fe Gamma riz)=f, t?—!e~‘ae 
Sine sin (x)=5-(e%e-") 

.TS Error erf (=m fewer 

doublebox; ' ieee 

cc Ig(z=— J, cos(zsin@) dé 

Il. oo 

Name @ Definition C(s)= Sk (Re s>1) 

Sp =} 

.vs +2p 


Gamma ®@$GAMMA (z) = int sub 0 sup inf t sup (z-1} e sup -t dt$ 

Sine @$sin (x) = 1 over 2i ( e sup ix - e sup -ix )$ 

Error @$ roman erf (z) = 2 over sqrt pi int sub 0 sup ze sup (-t sup 2} dt$ 
Bessel @$ J sub 0 (z) = 1 over pi int sub 0 sup pi cos (z sin theta ) d theta $ 
Zeta @$ zeta (s) = sum from k=! to inf k sup -s ~~( Re“s > 1)$ 

.vs -2p 

TE 


eae 


Input: Outpat: 
.IS Readability of Text 
box, tab(:); Line Width and Leading for 10-Point Type 
cbssss Line ]| Set | 1-Point | 2-Point | 4-Point 
ch2 $55 
clle|clelc : 


cilcjelelc 

r2 |[n2|n2|n2[n. 

Readability of Text 

Line Width and Leading for 10-Point Type 


Line : Set : 1-Point : 2-Point : 4-Point 
Width : Solid : Leading : Leading : Leading 


9 Pica: \-9.3‘\-6.0:\-5.3:\-7.1 
‘\-0.3:\-1.7 


So) 

38 
OW. 
—33-. 

ria 
Owned 
© 60 Gr 
Mn 


.TE 


Input: 


TS 
cs 
cip-2 s 
in 
an. 


? 


Some London Transport Statistics 


(Year 1964) 
Railway route miles ® 244 
Tube ® 66 

Sub-surface @ 22 

Surface ® 156 

Sp .d 

T& 

Ir 

ar. 

Passenger traffic \- railway 
Journeys 2674 million 
Average length ®4.55 miles 
Passenger miles © 3,066 million 
TH 

Ir 

ar. 

Passenger traffic \- road 
Journeys ®2,252 million 
Average length ®2. 26 miles 
Passenger miles ®5,094 million 
T& 

In 

an. 

sp .d 

Vehicles @ 12,521 

Railway motor cars © 2,905 
Railway trailer cars © 1,269 
Total railway 04,174 
Omnibuses © 8,347 

T& 

In 

an. 

Sp .d 

Staff © 73,739 
Administrative, etc. 05,582 
Civil engineering © 5,134 
Electrical eng. ® 1,714 
Mech. eng. \- railway ®4,310 
Mech. eng. \- road @9,152 
Railway operations @ 8,930 
Road operations ® 35,946 
Other ®2,971 

»TE 


oe pe 


244 
66 
22 


Output: 
Some London Transport Statistics 
(Year 1964) 
Railway route miles 
Tube 
Sub-surface 
Surface 


Passenger traffic — railway 
Journeys 
Average length 
Passenger miles 
Passenger traffic — road 
Journeys 
Average length 
Passenger miles 


Vehicles 
Railway motor cars 
Railway trailer cars 
Total railway 
Omnibuses 


Staff 
Administrative, etc. 
Civil engineering 
Electrical eng. 
Mech. eng. — railway 
Mech. eng. — road 
Railway operations 
Road operations 
Other 


156 


674 million 
4.55 miles 
3,066 million 


2,252 million 
2.26 miles 
5,094 million 


12,521 
2,905 
1,269 
4,174 
8,347 


73,739 
5,582 
5,134 
1,714 
4,310 
9,152 
8,930 

35,946 
2,971 


ce 


center box; 

cSs 

clss 

ccc 

IBi n. 

New Jersey Representatives 

(Democrats) 

sp .5 

Name ® Office address © Phone 

sp .5 

James J. Florio®23 S. White Horse Pike, Somerdale 08083 © 609-627-8222 
William J. Hughes ©2920 Atlantic Ave., Atlantic City 08401 @ 609-345-4844 
James J. Howard @®801 Bangs Ave., Asbury Park 07712 © 201-774-1600 
Frank Thompson, Jr. ®10 Rutgers Pl. , Trenton 08618 © 609-599-1619 
Andrew Maguire ®115 W. Passaic St., Rochelle Park 07662 © 201-843-0240 
Robert A. Roe @®U.S.P.0O., 194 Ward St., Paterson 07510 © 201-523-5152 
Henry Heistoski 2666 Paterson Ave., East Rutherford 07073 © 201-939-9090 
Peter W. Rodino, Jr. ODSuite 1435A, 970 Broad St., Newark 07102 © 201-645-3213 
Joseph G. Minish ®308 Main St., Orange 07050 @ 201-645-6363 

Helen S. Meyner © 32 Bridge St., Lambertville 08530 © 609-397-1830 
Dominick V. Daniels @ 895 Bergen Ave., Jersey City 07306 © 201-659-7700 
Edward J. Patten ® Natl. Bank Bidg., Perth Amboy 08861 © 201-826-4610 
sp .5 

.T& 

c1SS 

IB In. ° 

(Republicans) 

Sp .dVv 

Millicent Fenwick 41 N. Bridge St., Somerville 08876 © 201-722-8200 
Edwin B. Forsythe ® 301 Mill St., Moorestown 08057 @ 609-235-6622 
Matthew J. Rinaldo ® 1961 Morris Ave., Union 07083 © 201-687-4235 
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Output: 
New Jersey Representatives 
(Democrats) 
Name Office a'dress Phone 

James J. Florio 23 S. White Horse Pike, Somerdale 08083 609-627-8222 
William J. Hughes 2920 Auantic Ave., Atlantic City 0840! )9- 345-4844 
James J. Howard 801 Bangs Ave., Asbury Park 07712 201-774-1600 
Frank Thompson, Jr. 10 Rutgers Pl., Trenton 08618 609-599-1619 
Andrew Maguire 115 W. Passaic St., Rochelle Park 07662 201-843-0240 
Robert A. Roe U.S.P.0., 194 Ward St., Paterson 07510 201-523-5152 
Henry Helstoski 666 Paterson Ave., East Rutherford 07073 201-939-9090 
Peter W. Rodino, Jr. Suite 1435A, 970 Broad St., Newark 07102 201-645-3213 
Joseph G. Minish 308 Main St., Orange 07050 201-645-6363 
Helen S. Meyner 32 Bridge St., Lambertville 08530 609-397-1830 
Dominick V. Daniels 895 Bergen Ave., Jersey City 07306 201-659-7700 
Edward J. Patten Nati. Bank Bidg., Perth Amboy 08361 201-826-4610 

(Republicans) 
Millicent Fenwick 41 N. Bridge St., Somerville 08876 201-722-8200 
Edwin B. Forsythe 301 Mill St., Moorestown 08057 609-235-6622 


Matthew J. Rinaldo 


1961 Morris Ave., Union 07083 


201-687-4235 


This is a paragraph of normal text placed here only to indicate where the left and right margins 
are. In this way the reader can judge the appearance of centered tables or expanded tables, and 
observe how such tables are formatted. 


Input: 


.TS 
expand; 
csss 
cece 
Linon. 


. 


Beil Labs Locations 
Name @ Address ® Area Code ® Phone 

Holmdei ® Holmdel, N. J. 07733 © 201 © 949-3000 
Murray Hill © Murray Hill, N. J. 07974 @ 201 © 582-6377 
Whippany @ Whippany, N. J. 07981 ©201 ©3386-3000 
Indian Hill ® Naperville, Illinois 60540 @ 312 © 690-2000 


TE 

Output: 
Bell Labs Locations 

Name Address Area Code Phone 
Holmdel} Hoimdel, N. J. 07733 201 949-3000 
Murray Hill Murray Hill, N. J. 07974 201 582-6377 
Whippany Whippany, N. J. 07981 201 386-3000 
Indian Hill Naperville, Illinois 60540 312 690-2000 


a] 9% 


Input: 


TS 
box: 
cb s 8 § 


clele s 

ttiw (Li) | Itw(2i) | Ip8 | lw(1. 6i)p8. 

Some Interesting Places 

Name® Description ® Practical Information 

T{ 

American Museum of Natural History 

TI@T{ 7 

The collections fill 11.5 acres (Michelin) or 25 acres (MTA) 
of exhibition hails on four floors. There is a full-sized replica 
of a blue whale and the world’s largest star sapphire (stolen in 1964). 
T}@® Hours@ 10-5, ex. Sun 11-5, Wed. to 9 


\"O\"@ Location® Tf 
a Park West & 79th Si. 


T 
\"®\"® Admission ® Donation: $1.00 asked 
\"D\" © Subway ® AA to 8ist St. 

\"D\" @ Telephone 212-873-4225 


Bronx ZooMT{ 

About a mile long and .6 mile wide, this is the largest zoo in America. 
A lion eats 13 pounds 

of meat a day while a sea lion eats 15 pounds of fish. 

T}@ Hours@ T{ 

10-4:30 winter, to 5:00 summer 


\VO\"@D Location®@ ue 
185th St. & Southern Bivd, the Bronx. 


T} 

\" Ovo Admission® $1.00, but Tu, We,Th free 
VOV®D Subway @ 2, 5 to East Tremont Ave. 
\"@O\"@D Telephone @ 212-933-1759 


Brooklyn Museum©® T{ 

Five floors of gaileries contain American and ancient art. 

There are American period rooms and architectural ornaments saved 
from wreckers, such as a classicai figure from Pennsylvania Station. 
T}@® Hours@ Wed-Sat, 10-5, Sun 12-5 

VO\"@ Location®@ T{ 

Eastern Parkway & Washington Ave., Brooklyn. 


T) 

VOVO@ Admission® Free 

\'O\D Subway ® 2,3 to Eastern Parkway. 

\O\"@D Telephone D 212-638-5000 

T{ 

New-York Historical Society 

TI@TI( 

All the original paintings for Audubon's 

ol 

Birds of America 

are here, as are exhibits of American decorative arts, New York history, 
Hudson River school paintings, carriages, and glass paperweights. 
T}]@ Hours@ T{ 

ee & Sun, 1-5; Sat 10-5 


r 
OV @ Location®@ T{ 

oe Park West & 77th St. 

T 

\® \"@ Admission © Free 
OVO Subway @ AA to 81st St. 
—) ® Telephone 212-873-3400 
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Output: 


American Muse- 


Some Interesting Places 


Description Practical Information 


The collections fill 11.5 acres | Hours 10-5, ex. Sun 11-5, Wed. to 9 


um of Natural (Michelin) or 25 acres (MTA) | Location Central Park West & 75th St. 
History of exhibition halls on four | Admission | Donation: $1.00 asked 
floors. There is a full-sized re- | Subway AA to 81st St. 
plica of a blue whale and the | Telephone | 212-873-4225 


world’s largest star sapphire 
(stolen in 1964). 
About a mile long and .6 mile 
wide, this is the largest zoo in 
America. A_ lion eats 18 
pounds of meat a day while a 
sea lion eats 15 pounds of fish. 


10-4:30 winter, to 5:00 summer 


185th St. & Southern Bivd, the 
Bronx. 


$1.00, but Tu, We,Th free 
2, 5 to East Tremont Ave. 
212-933-1759 
Wed-Sat, 10-5, Sun 12-5 


Eastern Parkway & Washington 
Ave., Brooklyn. 


Free 
2,3 to Eastern Parkway. 
212-638-5000 


Hours 
Location 


Bronx Zoo 


Admission 
Subway 
Telephone 


Hours 
Location 


Five floors of galleries contain 
American and ancient art. 
There are American period 
rooms and architectural orna- 
ments saved from  wreckers, 
such as a classical figure from 
Pennsylvania Station. 


All the original paintings for 


Brooklyn Museum 


Admission 
Subway 
Telephone 


New-York His- Tues-Fri & Sun, 1-5; Sat 10-5 


Hours 


torical Society Audubon’s Birds of America are | Location Central Park West & 77th St. 
here, as are exhibits of Ameri- | Admission | Free 
can decorative arts, New York | Subway AA to 81st St. 


history, Hudson River school | Telephone | 212-873-3400 
paintings, carriages, and glass 


paperweights. 
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List of Tbl Command Characters and Words 


Command Meaning Section 
aA Alphabetic subcolumn 2 
allbox Draw box around ail items l 
’.. bB Boldface item 2 
box Draw box around table l 
eC Centered column 2 
center Center table in page l 
doublebox Doubled box around table l 
eE Equal width columns 2 
expand Make tabie full line width | 
fF Font change 2 
il Italic item 2 
IL Left adjusted column 2 
nN Numerical column 2 
nan Column separation 2 
pP Point size change 2 
rR Right adjusted column 2 
s§ Spanned item 2 
tT Vertical spanning at top 2 
tab (x) Change data separator character l 
T{ T} Text block 3 
w Ww Minimum width value 2 
oXX Included troff command 3 
Vertical line 2 
{| Double vertical line 2 
‘a Vertical span 2 
\* Vertical span 3 
= Double horizontal line 2,3 
Horizontal line 2,3 
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A TROFF Tutorial 


Brian W. Kernighan 


Beil Laboratories 
Murray Hill, New Jersey 07974 


ABSTRACT 


troff is a text-formatting program for driving the Graphic Systems photo- 
typesetter on the UNIX and GCOS operating systems. This device is capable of 
producing high quality text, this paper is an example of troff output. 


The phototypesetter itself normally runs with four fonts, containing 
roman, italic and bold letters {as on this page), a full greek alphabet, and a sub- 
stantial number of special characters and mathematical symbols. Characters can 
be printed in a range of sizes, and placed anywhere on the page. 


troff allows the user full control over fonts, sizes, and character positions, 
as well as the usual features of a formatter — right-margin justification, 
automatic hyphenation, page tutling and numbering, and so on. It also provides 
macros, arithmetic variables and operations, and conditional testing, for compli- 
cated formatting tasks. 


This document is an introduction to the most basic use of troff. It 
presents just enough information to enable the user to do simple formatting 
tasks like making viewgraphs, and to make incremental changes to existing 
packages of troff commands. It assumes that the reader is familiar with a for- 
matter like roff on UNIX or GCOS. In most respects, the UNIX formatter nroff 
is identical to troff, so this document also serves as a tutorial on nroff. 


A TROFF Tutorial 


Brian W. Kernighan 


Beil Laboratories 
Murray Hill, New Jersey 07974 


1. Introduction 


troff {1} is a text-formatting program, writ- 
ten by J. F. Ossanna, for producing high-quality 
printed output from the phototypesetter on the 
UNIX and GCOS operating systems. This docu- 
ment is an example of troff output. 


The single most important rule of using 
troff is not to use it directly, but through some 
intermediary. In many ways, troff resembies an 
assembly language — a remarkably powerful and 
flexible one — but nonetheless such that many 
operations must be specified at a level of detail 
and in a form that is too hard for most peopie to 
use effectively. 


For two special applications, there are pro- 
grams that provide an interface to troff for the 
majority of users. eqn [2] provides an easy to 
learn language for typesetting mathematics; the 
eqn user need know no troff whatsoever to 
typeset mathematics. tbi [3] provides the same 
convenience for producing tables of arbitrary 
complexity. 


For producing straight text (which may 
weil contain mathematics or tables), there are a 
number of ‘macro packages’ that define format- 
ting rules and operations for specific styles of 
documents, and reduce the amount of direct 
contact with troff. In particular, the ‘—ms’ [4] 
and PWB/MM [5] packages for Beil Labs inter- 
nal memoranda and external papers provide most 
of the facilities needed for a wide range of docu- 
ment preparation. (This memo was prepared 
with ‘—ms’.) There are also packages for view- 
graphs, for simulating the older roff formatters 
on UNIX and GCOS, and for other special applica- 
tions. Typicaily you will find these packages 
easier to use than troff once you get beyond the 
most trivial operations, you should always con- 
sider them first. 


{In the few cases where existing packages 
don't do the whole job, the solution is zof to 
write an entirely new set of troff instructions 
from scratch, but to make small changes to adapt 
packages that already exist. 


In accordance with this philosophy of let- 
ting someone else do the work, the part of troff 
described here is only a small part of the whole, 
although it tries to concentrate on the more use- 
ful parts. In any case, there is no ‘attempt to be 
complete. Rather, the emphasis is on showing 
how to do simple things, and how to make incre- 
mental changes to what already exists. The con- 
tents of the remaining sections are: 


2. Point sizes and line spacing 
3. Fonts and special characters 
4. Indents and line length 
5. Tabs 
6. Local motions: Drawing lines and characters 
7. Strings 
8. Introduction to macros 
9. Titles, pages and numbering 
10. Number registers and arithmetic 
li. Macros with arguments 
12. Conditionals 
13. Environments 
14. Diversions 
Appendix: Typesetter character set 


The troff described here is the C-language ver- 
sion running on UNIX at Murray Hill, as docu- 
mented in [1]. 


To use troff you have to prepare not only 
the actual text you want printed, but some infor- 
mation that tells Aow you want it printed. 
(Readers who use roff will find the approach 
familiar.) For troff the text and the formatting 
information are often intertwined quite inti- 
mately. Most commands to troff are placed on a 
line separate from the text itself, beginning with 
a period (one command per line). For example, 


Some text. 
ps 14 
Some more text. 


will change the ‘point size’, that is, the size of 
the letters being printed, to ‘14 point’ (one point 
is 1/72 inch) like this: 


Some text. SOME more text. 


Occasionally, though, something special 
occurs in the middle of a line — to produce 
Area = are 


you have to type 
Area = \(#p\fIr\fR\}\s8\u2\d\s0 


(which we will explain shortly). The backslash 
character \ is used to introduce troff commands 
and special characters within a line of text. 


2. Point Sizes; Line Spacing 


AsS mentioned above, the command .ps 
sets the point size. One point is 1/72 inch, so 
6-point characters are at most 1/12 inch high, 
and 36-point characters are '2 inch. There are 15 
point sizes, listed below. 


6 pont’ Puck my box with five dozen liquor jugs. 

7 point: Pack my box with five dozen liquor jugs. 

8 point: Pack my box with five dozen liquor jugs. 

9 point: Pack my box with five dozen liquor jugs. 
10 point: Pack my box with five dozen liquor 


11 point: Pack my box with five dozen 


12 point: Pack my box with five dozen 


14 point: Pack my box with five 


16 point 18 point 20 point 


22 24 28 36 


If the number after .ps is not one of these 
legal sizes, it is rounded up to the next valid 
value, with a maximum of 36. If no number fol- 
lows .ps, troff reverts to the previous size, what- 


ever it was. troff begins with point size 10, 
which is usually fine. This document is in 9 
point. 


The point size can also be changed in the 
middle of a line or even a word with the in-line 
command \s. To produce 


UNIX runs on a PDpP-11/45 


type 
\s8UNIX\s10 runs on a \s8PDP-\s1011/45 


As above, \s should be followed by a legal point 
size, except that \sO0 causes the size to revert to 
its previous value. Notice that \si011 can be 
understood correctly as ‘size 10, followed by an 
11", if the size is legal, but not otherwise. Be 
cautious with similar constructions. 


Relative size changes are also legal and 
useful: 


\s—2UNIX\s +2 


temporarily decreases the size, whatever it is, by 
two points, then restores it. Relative size 
changes have the advantage that the size 
difference is independent of the starting size of 
the document. The amount of the relative 
change is restricted to a single digit. 


The other parameter that determines what 
the type looks like is the spacing between lines, 
which is set independently of the point size. 
Vertical spacing is measured from the bottom of 
one line to the bottom of the next. The com- 
mand to control vertical spacing is .vs. For run- 
ning text, it is usually best to set the vertical 
spacing about 20% bigger than the character size. 
For example, so far in this document, we have 
used **9 on [1°°, that is, 


.ps 9 
.vs Ilp 


If we changed to 


.ps 9 

.VS 9p 
the running text would look like this. After a 
few lines, you will agree it looks a little cramped. 
The right vertical spacing is partly a matter of 
laste, depending on how much text you want to 
Squeeze into a given space, and partly a matter 
of traditional printing style. By default, troff 
uses 10 on 12. 


Point size and vertical spacing 
make a substantial difference in the 
amount of text per square inch. 
This is 12 on 14. 


Pot size and vertical spacing make 4 substantial difference in 
the amount of (ext per square inch. For example, |0 on 12 uses about 
twice as much spece as 7? on 8. This is 6 on 7, winch iy even smaller. 11 
packs a lot more words per line, bul you cun go Olind irying to read it. 


When used without arguments, .ps and .vs 
revert to the previous size and vertical spacing 
respectively. 


The command .sp is used to get extra vert- 
ical space. Unadorned, it gives you one extra 
blank line (one .vs, whatever that has been set 
to). Typically, that's more or less than you 
want, so .sp can be followed by information 
about how much space you want — 


Sp 2 
means ‘two inches of vertical space’. 
sp 2p 
means ‘two points of vertical space’: and 


sp 2 


means ‘two vertical spaces’ ~— two of whatever 
.vs is set to (this can also be made explicit with 
.8p 2v); troff aiso understands decimal fractions 
in most places, so 


sp 1.53 


is a space of 1.5 inches. These same scale fac- 
tors can be used after .vs to define line spacing, 
and in fact after most commands that deai with 
physical dimensions. 


It should be noted that all size numbers 
are converted internally to ‘machine units’, 
which are 1/432 inch (1/6 point). For most pur- 
poses, this is enough resolution that you don't 
have to worry about the accuracy of the 
representation. The situation is not quite so 
good vertically, where resolution is 1/144 inch 
(1/2 point). 


3. Fonts and Special Characters 


troff and the typesetter allow four different 
fonts at any one time. Normally three fonts 
(Times roman, italic and bold) and one collec- 
tion of special 
mounted. 


abcdefghijkimnoparstuvwxyz 0123456789 
ABCDEFGHIJKLMNOPQRSTUVWXYZ 
abcdefghijkimnoparstuywxyz 0123456789 
ABCDEFGHIJKLMNOPQRSTUVWXYZ 
abcdefghijkimnopqrstuvwxyz 0123456789 
ABCDEFGHIJKLMNOPQRSTUVWKYZ 


The greek, mathematical symbols and miscellany 
of the special font are listed in Appendix A. 


troff prints in roman unless told otherwise. 
To switch into bold, use the .ft command 


ft B 
and for italics, 
ft 


To return to roman, use .ft R: to return to the 
previous font, whatever it was, use either .ft P or 
just .ft. The ‘underline’ command 


ul 


causes the next input line to print in italics. ul 
can be followed by a count to indicate that more 
than one line is to be italicized. 


Fonts can also be changed within a line or 
word with the in-line command \f: 


boldface text 
is produced by 


characters are permanently 


\fBbold\fiface\fR text 


If you want to do this so the previous font, 
whatever it was, is left undisturbed, insert extra 
\fP commands, like this: 


\fBbold\fP\flface\fP\fR text\fP 


Because only the immediately previous font is 
remembered, you have to restore the previous 
font after each change or you can lose it. The 
same is true of .ps and .vs when used without an 
argument. 


There are other fonts available besides the 
Standard set, although you can still use only four 
at any given time. The command .fp tells troff 
what fonts are physically mounted on the 
typesetter: 


fp 3H 


Says that the Helvetica font is mounted on posi- 
tion 3. (For a complete list of fonts and what 
they look like, see the troff manual.) Appropriate 
.p commands should appear at the beginning of 
your document if you do not use the standard 
fonts. 


It is possible to make a document rela- 
tively independent of the actual fonts used to 
print it by using font numbers instead of names; 
for example, \f3 and .ft 3 mean ‘whatever font is 
mounted at position 3°,:and thus work for any 
setting. Normal settings are roman font on |, 
italic on 2, bold on 3, and special on 4. 


There is also a way to get ‘synthetic’ bold 
fonts by overstriking letters with a slight offset. 
Look at the .bd command in [1]. 


Special characters have four-character 
Names beginning with \(, and they may be 
inserted anywhere. For example, 


a be th me 
is produced by 
\4 + \C2 = \(34 


in particular, greek letters are all of the form 
\(*#—, where — is an upper or lower case roman 
letler reminiscent of the greek. Thus to get 


L(axB) — 
in bare troff we have to lype 
\(sS (Q\(sa\(mu\ (+b) \(— > \GF 


That line is unscrambied as follows: 


\(sS 
( 
\(ea 
\(mu 
\(eb 
) 
\(-> 
\ (if 
A complete list of these special names occurs in 
Appendix A. 


In eqn [2] the same effect can be achieved 
with the input 

SIGMA ( alpha times beta ) — > inf 
which is less concise, but clearer to the unini- 
tiated. 


Notice that each four-character name is a 
single character as far as troff is concerned — the 
‘translate’ command 


8) ~exR—M 


tr \(mi\(em 
is perfectly clear, meaning 
i-— 


that is, to translate — into —. 


Some characters are automaticaily 
translated into others: grave and acute ~ 
accents (apostrophes) become open and close 
single quotes ‘ *; the combination of **...”° is gen- 
erally preferable to the double quotes ”...". Simi- 
larly a typed minus sign becomes a hyphen -. To 
print an explicit ~ sign, use \-. To get a 
backslash printed, use \e. 


4. Indents and Line Lengths 


troff starts with a line length of 6.5 inches, 
too wide for 82x11 paper. To reset the line 
length, use the .1l command, as in 


Al 61 


As with .sp, the actual length can be specified in 
several ways; inches are probably the most intui- 
live. 


The maximum line length provided by the 
typesetter is 7.5 inches, by the way. To use the 
full width, you will have to reset the default phy- 
sical left margin (‘‘page offset’’), which is nor- 
mally slightly less than one inch from the left 
edge of the paper. This is done by the .po com- 
mand. 


.po 0 


sets the offset as far to the left as it will go. 


The indent command .in causes the left 
margin to be indented by some specified amount 
from the page offset. If we use .in to move the 
left margin in, and .ll to move the right margin 
to the left, we can make offset blocks of text: 


.in 0.31 

At —0.33 

text to be set into a block 
A +0.3i 

in —0.31 


will create a block that looks like this: 


Pater noster qui est in caelis 
sanctificetur nomen tuum, adveniat 
regnum tuum; fiat voluntas tua, sicut 
in caelo, et in terra. ... Amen. 


s 


Notice the use of ‘+’ and ‘—’ to specify the 
amount of change. These change the previous 
setting by the specified amount, rather than just 
overriding it. The distinction is quite important: 
ll +11 makes lines one inch longer; .ll li makes 
them one inch Jong. 


With .in, . and .po, the previous value is 
used if no argument is specified. 


To indent a single line, use the ‘temporary 
indent’ command .ti. For example, all paragraphs 
in this memo effectively begin with the com- 
mand 


i 3 


Three of what? The default unit for .ti, as for 
most horizontally oriented commands (ll, .in, 
.po), is ems; an em is roughly the width of the 
letter ‘m’ in the current point size. (Precisely, a 
em in size p is p points.) Although inches are 
usually clearer than ems to people who don't set 
type for a living, ems have a place: they are a 
measure of size that is proportional to the 
current point size. If you want to make text that 
keeps its proportions regardless of point size, you 
should use ems for all dimensions. Ems can be 
specified as scale factors directly, as in .ti 2.5m. 


Lines can aiso be indented negatively if the 
indent is already positive: 


ti ~0.3i 


causes the next line to be moved back three 
tenths of an inch. Thus to make a decorative 
initial capital, we indent the whole paragraph, 
then move the letter ‘P’ back with a .ti com- 
mand: 


ater noster qui est in caelis 
sanctificetur nomen tuum, ad- 
veniat regnum tuum; fiat volun- 


tas tua, sicul in caelo, et in terra. ... 
Amen. 


Of course, there is also some trickery to make 
the *P’ bigger (just a ‘\s36P\s0°), and to move it 
down from its normal position (see the section 
on local motions). 


5. Tabs 


Tabs (the ASCII ‘horizontal tab’ character) 
can be used to produce output in columns, orf to 
set the horizontal position of output. Typically 
tabs are used only in unfilled text. Tab stops are 
set by default every half inch from the current 
indent, but can be changed by the .ta command. 
To set stops every inch, for example, 


ta li 2i 31 41 Si 6 


Unfortunately the stops are left-justified 
only (as on a typewriter), so lining up columns 
of right-justified numbers can be painful. If you 
have many numbers, or if you need more com- 
plicated table layout, don’t use troff directly. use 
the tbl program described in [3]. 


For a handful of numeric columns, you 
can do it this way: Precede every number by 
enough blanks to make it line up when typed. 


nf 
ta li 2i 3: 

1 tab 2 tab 3 
40 tab 50 tab 60 
700 tab 800 tad 900 
fi 


Then change each leading blank into the string 
\0. This is a character that does not print, but 


that has the same width as a digit. When 
printed, this will produce 
l 2 3 
40 50 60 
700 800 900 


It is also possible to fill up tabbed-over 
Space with some character other than blanks by 
setting the ‘tab replacement character’ with the 
.tc command: 


ta 1.59 2.51 
tc \(ru (\(ru is *_" 
Name tab Age tab 
produces 
Name Age 


To reset the !ttb replacement character to a 
blank, use .te with no argument. (Lines can also 
be drawn with the \l command, described in Sec- 
tion 6.) 


troff also provides a very general mechan- 
ism called ‘fields’ for setting up complicated 
columns. (This is used by tbl). We will not go 
into it in this paper. 


6. Local Motions: Drawing lines and charac- 
ters 


Remember ‘Area = wre’ and the big °P’ 
in the Paternoster. How are they done? troff 
provides a host of commands for placing charac- 
ters of any size al any place. You can use them 
to draw special characters or to tune your output 
for a particular appearance. Most of these com- 
mands are straightforward, but messy to read 
and tough to type correctly. 


If you won’t use eqn, subscripts and super- 
scripts are most easily done with the half-line 
local motions \u and \d. To go back up the page 
haif a point-size, insert a \u at the desired place: 
to go down, insert a \d. (\u and \d should always 
be used in pairs, as explained below.) Thus 


Area = \(«pr\u2\d 


produces 

Area = wee 
To make the ‘2° smailer, bracket it with 
\s—2...\s0. Since \u and \d refer to the current 
point size, be sure to put them either both inside 
or both outside the size changes, or you will get 
an unbalanced vertical motion. 


Sometimes the space given by \u and \d 
isn’t the right amount. The \v command can be 
used !o request an arbitrary amount of vertical 
motion. The in-line command 


\v' (amount)’ 


Causes mouon up or down the page by the 
amount specified in ‘(amount)’. For example, to 
move the ‘P’ down, we used 


in +0.6i (move paragraph in) 
Al —0.3i (shorten lines) 
ti 0.31 (move P back) 


\v'2"\s36P\sO\v’ — 2’ater noster qui est 
in caelis ... 


A minus sign causes upward motion, while no 
sign or a plus sign means down the page. Thus 
\v'—2’ causes an upward vertical motion of two 
line spaces. 


There are many other ways to specify the 
amount of motion — 


\v'0. 17 

\v'3p" 

\vi = 0.5m" 
and so on are ail legal. Notice that the scale 
specifier i or p or m goes inside the quotes. Any 
character can be used in place of the quotes; this 
is also true of all other troff commands described 
in this section. 


Since troff does not take within-the-line 
vertical motions into account when figuring out 
where it is on the page, output lines can have 
unexpected positions if the left and right ends 
aren't at the same vertical position. Thus \v, 
like \u and \d, should always balance upward 
vertical motion in a line with the same amount 
in the downward direction. 


_ Arbitrary horizontal motions are also avad- 
able = \h is quite analogous to \v, except that 
the default scale factor is ems instead of line 
spuces. AS an example, 


Ah’ =0.17 


cuuses a backwards motion of a tenth of an inch. 
As a practical matter, consider printing the 
mathematical symbol ‘>>’. The defuult spacing 
is tod wide, so eqn replaces this by 


>\h'—0.3m'> 
to produce >>. 


Frequently \h is used with the ‘width func- 
tion’ \w to generate motions equal to the width 
of some character string. The construction 


\w' thing’ 
is a number equal to the width of ‘thing’ in 
machine units (1/432 inch). All troff computa- 
tions are ultimately done in these units. To 


move horizontally the width of an ‘x’, we can 
say 

\h’\w'x’u’ 
As we mentioned above, the defauit scale factor 
for all horizontal dimensions is m, ems, so here 
we must have the u for machine units, or the 
motion produced will be far too large. troff is 


quite happy with the nested quotes, by the way, 
so long as you don’t leave any out. 


As a live example of this kind of construc- 
tion, all of the command names in the text, like 
sp, were done by overstriking with a slight 
offset. The commands for .sp are 


sp\h' —\w’.sp’u'\h’lu’.sp 


That is, put out ‘.sp’, move left by the width of 
‘sp’, move right 1 unit, and print ‘.sp’ again. 
(© * course there is a way to avoid typing that 
much input for each command name, which we 
will discuss in Section 11.) 


There are also several special-purpose troff 
commands for local motion. We have already 
seen \0, which is an unpaddable white space of 
the same width as a digit. ‘Unpaddable’ means 
that it will never be widened or split across a line 
by line justification and filling. There is also 
\(blank), which is an unpaddable- character the 
width of a space, \|, which is haif that width, \*, 
which is one quarter of the width of a space, and 
\&, which has zero width. (This last one is use- 
ful, for example, in entering a text line which 
would otherwise begin with a ‘.".) 


The command \o, used like 
\o’set of characters’ 


causes (up to 9) characters to be 9verstruck, cen- 
tered on the widest. This is nice for accents, as 
in 


syst\o"e\(ga"me t\o"e\ (aa"l\o"e\(aa*phonique 
which makes 
systeme téléphonique 
The accents are \(ga and \(aa, or \° and \’; 
remember that each is just one character to troff. 


You can make your own overstrikes with 
another special convention, \z, the zero-motion 
command. \zx suppresses the normal horizontal 
motion after printing the single character x, so 
another character can be laid on top of it. 
Although sizes can be changed within \o, it 
centers the characters on the widest, and there 
can be no horizontal or vertical motions, so \z 
may be the only way to get what you want: 


is produced by 


Sp 2 
\s8\z\ (sq\s14\z\ (sq\s22\z\ (sq\s36\ (sq 


The .sp is needed to leave room for the result. 


As another example, an extra-heavy semi- 
colon that looks like 


; instead of ; or 5 


can be constructed with a big comma and a big 
period above it: 


\s + 6\z,\v'—0.25m’.\v’‘0.25m’\s0 


‘0.25m’ is an empirical constant. 


A more ornate overstrike is given by the 
bracketing function \b, which piles up characters 
vertically, centered on the current baseline. 
Thus we can get big brackets, constructing them 
with piled-up smailer pieces: 


ia 


by typing in only this: 


Sp 
\b'\ t\ (ik\ (ib) \bo'\ic\ If x \b\(re\ (rf \b'\ (rt\ (rk \ (rb’ 


troff also provides a convenient facility for 
drawing horizontal and vertical lines of arbitrary 
length with arbitrary characters. \I'li’ draws a 
line one inch long, like this: 
The length can be followed by the character to 
use if the — isn’t appropriate; \!'0.5i.’ draws a 
half-inch line of dots: ............... The construc- 
tion \L is entirely analogous, except that it draws 
a vertical line instead of horizontal. 


7. Strings 


Obviously if a paper contains a large 
number of occurrences of an acute accent over a 
letter ‘e’, typing \o"e\” for each é would be a 
great nuisance. 


Fortunately, troff provides a way in which 
you can store an arbitrary collection of text in a 
‘string’, and thereafter use the siring name as a 
shorthand for its contents. Sirings are one of 
several troff mechanisms whose judicious use 
lets you type a document with less effort and 
organize it so that extensive format changes can 
be made with few editing changes. 


A reference to a string is replaced by what- 
ever text the string was defined as. Strings are 
defined with the command .ds. The line 


.ds e \o"e\™" 
defines the string e to have the value \o"e\” 


String names may be either one or two 
characters long, and are referred to by \ex for 
one character names or \e(xy for two character 
names. Thus to get téléphone, given the 
definition of the string e as above, we can say 
t\sel\*ephone. 


If a string must begin with blanks, defineiit 
as 


.ds xx” text 


The double quote signals the beginning of the 
definition. There is no trailing quote: the end of 
the line terminates the string. 


A string may actually be several lines long; 
if troff encounters a \ at the end of any line, it is 
thrown away and the next line added to the 
current one. So you can make a long string sim- 
ply by ending each line but the last with a 
backslash: 


.ds xx this \ 
is a very \ 
long string 


Strings may be defined in terms of other 
strings, or even in terms of themseives; we will 
discuss some of these possibilities later. 


8. Introduction to Macros 


Before we can go much further in troff, we 
need to learn a bit about the macro facility. In 
its simplest form, a macro is just a shorthand 
notation quite similar to a string. Suppose we 
want every paragraph to start in exactly the same 
way — with a space and a temporary indent of 
two ems: 


SP 
i #2m 


Then to save typing, we would like to collapse 
these into one shorthand line, a troff ‘command’ 
like 


.PP 


that would be treated by troff exactly as 


Sp 
i + 2m 


_.PP is called a macro. The way we tell troff what 


.PP means is to define it with the .de command: 


.de PP 


SD 
uo +#2m 


The first line names the macro (we used ‘.PP’ 
for ‘paragraph’, and upper case so it wouldn't 
conflict with any name that troff might already 
know about). The last line .. marks the end of 
the definition. In between is the text, which is 
simply inserted whenever troff sees the ‘com- 
mand’ or macro cail 


A 


A macro can contain any mixture of text and 
formatting commands. 


The definition of .PP has to precede its 
first use: undefined macros are simply ignored. 
Names are restricted to one or two characters. 


Using macros for commonly occurring 
sequences of commands is critically important. 
Not only does it save typing, but it makes later 
changes much easier. Suppose we decide that 
the paragraph indent is too small, the vertical 
space is much too big, and roman font should be 


forced. Instead of changing the whole docu- 
ment, we need only change the definition of .PP 
to something like = 

.de PP \" paragraph macro 

sp 2p 

ti #3m 


“ft R 


and the change takes effect everywhere we used 
PP. 


\" is a troff command that causes the rest 
of the line to be ignored. We use it here to add 
comments to the macro definition (a wise idea 
once definitions get complicated). 

As another example of macros, consider 
these two which start and end a block of offset, 
unfilled text, like most of the examples in this 
paper: 


.de BS 
SD 

nf 

in +0.3i 


\" start indented block 


.de BE \" end indented block 
Sp 
Ai 


in —0.3% 


Now we can surround text like 


Copy to 

John Doe 
Richard Roberts 
Stanley Smith 


by the commands .BS and .BE, and it will come 
Out as it did above. Notice that we indented by 
in +0.3i instead of .in 0.3i. This way we can 
nest our uses of .BS and BE to get blocks within 
blocks. 


If later on we decide that the indent should 
be 0.5i, then it is only necessary to change the 
definitions of .BS and .BE, not the whole paper. 


9. Titles, Pages and Numbering 

This is an area where things get tougher, 
by suse nothing is done for you automatically. 
Of necessity, some of this section is a cookbook, 
to be copied literally until you get some experi- 
ence. 

Suppose you want a title at the top of each 
page, Saying just 

left top 

In roff, one can say 


center top right top 


he ‘left top’center top'right top’ 
fo ‘left bottom’center bottom ’right bottom’ 


to get headers and footers automatically on every 
page. Alas, this doesn’t work in troff, a serious 
hardship for the novice. Instead you have to do 
a lot of specification. 


You have to say what the actual title is 
(easy); when to print it (easy enough); and what 
to do at and around the title line (harder). Tak- 
ing these in reverse order, first we define a 
macro .NP (for ‘new page’) to process titles and 
the like at the end of one page and the beginning 
of the next: 


.de NP 

‘bp 

‘sp 0.5Si 

tt ‘left top’center top right top’ 
‘sp 0.3i 


To make sure we're at the top of a page, we 
issue a ‘begin page’ command ‘bp, which causes 
a skip to top-of-page (we'll explain the ' shortly). 
Then we space down half an inch, print the title 
(the use of .tl should be self explanatory; later 
we will discuss parameterizing the titles), space 
another 0.3 inches, and we’re done. 


To ask for .NP at the bottom of each page, 
we have to say something like ‘when the text is 
within an inch of the bottom of the page, start 
the processing for a new page.’ This is done with 
a ‘when’ command .wh: 


wh -1li NP 


(No °‘.” is used before NP; this is simply the 
name of a macro, not a macro call.) The minus 
sign means ‘measure up from the bottom of the 
page’, so ‘— li’ means ‘one inch from the bot- 
tom’. . 

The .wh command appears in the input 
outside the definition of .NP; typically the input 
would be 


.de NP 


‘wh -1i NP 


Now what happens? As text is actually 
being output, troff keeps track of its vertical 
position on the page, and after a line is printed 
within one inch from the bottom, the .NP macro 
is activated. (In the jargon, the .wh command 
sets a trap at the specified place, which is 
‘sprung’ when that point-is passed.) .NP causes a 
skip to the top of the next page (that’s what the 
‘bp was for), then prints the title with the 
appropriate margins. 


Why ‘bp and 'sp instead of .bp and .sp? 
The answer is that .sp and .bp, like several other 
commands, cause a dreak to take place. That is, 
all the input text collected but not yet printed is 
flushed out as soon as possible, and the next 
input line is guaranteed to start a new line of 
output. If we had used .sp or .bp in the .NP 
macro, this would cause a break in the middle of 
the current output line when a new page is 
started. The effect would be to print the left- 
over part of that line at the top of the page, fol- 
lowed by the next input line on a new output 
line. This is not what we want. Using ' instead 
of . for a command tells troff that no break is to 
take piace — the output line currently being 
filled should nor be forced out before the space 
or new page. 


The list of commands that cause a break is 
short and natural: 


.bp .br nf .sp 


All others cause zo break, regardless of whether 
you use a. ora’. If you really need a break, add 
a .br command at the appropriate place. 


One other thing to beware of — if you're 
changing fonts or point sizes a lot, you may find 
that if you cross a page boundary in an unex- 
pected font or size, your titles come out in that 
size and font instead of what you intended. 
Furthermore, the length of a title is independent 
of the current line length, so titles will come out 
at the default length of 6.5 inches unless you 
change it, which is done with the .lt command. 


ce ff in oti 


There are several ways to fix the problems 
of point sizes and fonts in titles. For the sim- 
plest applications, we can change .NP to set the 
proper size and font for the title, then restore 
the previous values, like this: 


de NP 

‘Dp 

‘sp 0.5i 

ft R \" set title font to roman 
.ps 10 \" and size to 10 point 
lt 6i \" and length to 6 inches 


tl ‘left’center’right’ 


.ps \" revert to previous size 
ft P \" and to previous font 
sp 0.3i 


This version of .NP does not work if the 
fields in the .tl command contain size or font 
changes. To cope with that requires troff’s 
‘environment’ mechanism, which we will discuss 
in Section |3. 


To get a footer at the bottom of a page, 
you can modify .NP so it does some processing 
before the ‘bp command, or split the job into a 
footer macro invoked at the bottom margin and 
a header macro invoked at the top of the page. 
These variations are left as exercises. 


Output page numbers are computed 
automatically as each page is produced (starting 
at 1), but no numbers are printed unless you ask 
for them explicitly. To get page numbers 
printed, include the character % in the .t! line at 
the position where you want the number to 
appear. For example 


aL % -” 


centers the page number inside hyphens, as on 
this page. You can set the page number at any 
time with either .bp n, which immediately starts 
a new page numbered n, or with .pnn, which 
sets the page number for the next page but 
goesn't cause a skip to the new page. Again, 
.bp +n sets the page number to n more than its 
current value, .bp means .bp +1. 


10. Number Registers and Arithmetic 


troff has a facility for doing arithmetic, and 
for defining and using variables with numeric 
values, called aumber registers. Number regis- 
ters, like strings and macros, can be useful in 
setting up a document so it is easy to change 
later. And of course they serve for any sort of 
arithmetic computation. 


Like strings, number registers have one or 
two character names. They are set by the nr 
command, and are referenced anywhere by \nx 
(one character name) or \n(xy (two character 
name). 


There are quite a few pre-defined number 
registers maintained by troff, among them % for 
the current page number; oi for the current vert- 
ical position on the page; dy, mo and yr for the 
current day, month and year; and .s and .f for 
the current size and font. (The font is a number 
from 1 to 4.) Any of these can be used in com- 
putations like any other register, but some, like 
.s and .f, cannot be changed with .nr. 


As an example of the use of number regis- 
ters, in the —ms macro package [4], most 
significant parameters are defined in terms of the 
values of a handful of number registers. These 
include the point size for text, the vertical spac- 
ing, and the line and title lengths. To set the 
point size and vertical spacing for the following 
paragraphs, for example, a user may say 


-10- 


nr PS 9 

nr VS 11 
The paragraph macro .PP is defined (roughly) as | 
follows: 

.de PP 

.ps \\n(PS \" reset size 

vs \\n(VSp \" spacing 

ft R \" font 

sp 0.5v \" haif a line 

ti +#3m 


This sets the font to Roman and the point size 
and line spacing to whatever values are stored in 
the number registers PS and VS. 


Why are there two backslashes? This is 
the eternal problem of how to quote a quote. 
When troff originally reads the macro definition, 
it peels off one backslash to see what's coming 
next. To ensure that another is left in the 
definition when the macro is used. we have to 
put in two backslashes in the definition. If only 
one backslash is used, point size and vertical 
spacing will be frozen at the time the macro is 
defined, not when it is used. 


Protecting by an extra layer of backslashes 
is only needed for \n, \*, \$ (which we haven't 
come to yet), and \ itself. Things like \s, \f, \h, 
\v, and so on do not need an extra backslash, 
since they are converted by troff to an internal 
code immediately upon being seen. 


Arithmetic expressions can appear any- 
where that a number is expected. As a trivial 
example, 


nr PS \\n(PS—2 


decrements PS by 2. Expressions can use the 


arithmetic operators +, —, °, /, % (mod), the 
relational operators >, >=, <, <=, =, and 
1m (not equal), and parentheses. 


Although the arithmetic we have done so 
far has been straightforward, more complicated 
things are somewhat tricky. First, number regis- 
ters hold only integers. troff arithmetic uses 
truncating integer division, just like Fortran. 
Second, in the absence of parentheses, evalua- 
tion is done left-to-right without any operator 
precedence (including relational operators). 
Thus 


Te— 4+ 3/13 


becomes ‘—1’. Number registers can occur any- 
where in an expression, and so can scale indica- 
tors like p, i, m, and so on (but no spaces). 
Although integer division causes truncation, each 
number and its scale indicator is converted to 
machine units (1/432 inch) before any arithmetic 
is done, so 1i/2u evaluates to 0.5i correctly. 


The scale indicator u often has to appear 
when you wouldn't expect it — in particular, 
when arithmetic is being done in a context that 


implies horizontal or vertical dimensions. For 
example, 

A 7/23 
would seem obvious enough — 3'4_ inches. 


Sorry. Remember that the default units for hor- 
izontal parameters like .ll are ems. That’s really 
‘7 ems / 2 inches’, and when transiated into 
machine units, it becomes zero. How about 


At 71/2 


Sorry, still no good — the ‘2° is ‘2 ems’, so 
‘7i/2° is small, although not zero. You must use 


A 7i/2u 


So again, a safe rule is to attach a scale indicator 
to every number, even constants. 


For arithmetic done within a .ar command, 
there is no implication of horizontal or vertical 
dimension, so the default units are ‘units’, and 
7i/2 and 7i/2u mean the same thing. Thus 


ne ll 7i/2 
ll \\n (ilu 


does just what you want, so long as you don't 
forget the u on the . command. 


11. Macros with arguments 


The next step is to define macros that can 
change from one use to the next according to 
parameters supplied as arguments. To make this 
work, we need two things: first, when we define 


' for troff command names in text. 


the macro, we have to indicate that some parts 
of it will be provided as arguments when the 
macro is called. Then when the macro is called 
we have to provide actual arguments to be 
plugged into the definition. 


Let us illustrate by defining a macro SM 
that will print its argument two points smaller 
than the surrounding text. That is, the macro 
call 


SM TROFF 


will produce TROFF. 
The definition of .SM is 


de SM 
\s—2\\$1\s+2 


Within a macro definition, the symboi \\$a 
refers to the nth argument that the macro was 
called with. Thus \\$1 is the string to be placed 
in a smaller point size when .SM is called. - 


As a slightly more complicated version, the 
following definition of .SM permits optional 
second and third arguments that will be printed 
in the normal size: 


.de SM 
\\$3\s — 2\\$1\s+ 2\\$2 


» 


Arguments not provided when the macro is 
called are treated as empty, so 


SM TROFF ), 
produces TROFF), while 
5M TROFF ). ( 


produces (TROFF). It is convenient to reverse 
the order of arguments because trailing punctua- 
tion is much more common than leading. 


By the way, the number of arguments that 
a macro was called with is available in number 
register .§. 

The following macro .BD is the one used 


to make the “bold roman’ we have been using 
It combines 


horizontal motions, width computations, and 
argument rearrangement. 
.de BD 


\&\\S3\FI\\SI\ nh’ —\w\\S lu + Lu \\SI\fP\\S2 


The \h and \w commands need no extra 
backslash, as we discussed above. The \& is 
there in case the argument begins with a period. 


ee 


Two backsiashes are needed with the \\$n 
commands, though, to protect one of them when 
the macro is being defined. Perhaps a second 
example will make this clearer. Consider a 
macro called .SH which produces section head- 
ings rather like those in this paper, with the sec- 
tions numbered automatically, and the title in 
bold in a smaller size. The use is 


SH “Section title ..." 


(If the argument to a macro is to contain blanks, 
then it must be surrounded by double quotes, 
unlike a string, where only one leading quote is 
permitted.) 


Here is the definition of the SH macro: 


nrSH0O  \" initialize section number 
.de SH 

Sp 0.3i 

ft B 

nr SH \\n(SH+i \" increment number 
.ps \\n(PS —1 \" decrease PS 
\\n(SH. \\S$1 \" number. title 

-_ps \\n(PS \" restore PS 

sp 0.3: 

ft R 


The section number is kept in number register 
SH, which is incremented each time just before it 
is used. (A number register may have the same 
name as a macro without conflict but a string 
may not.) 


We used \\n(SH instead of \n(SH and 
\\n(PS instead of \n(PS. If we had used \n(SH, 
we would get the value of the register at the time 
the macro was defined, not at the time it was 
used. If that’s what you want, fine, but not here. 
Similarly, by using \\n(PS, we get the point size 
at the time the macro its cailed. 


As an example that does not involve 
numbers, recall our .NP macro which had a 


tl ‘left’center right’ 


We could make these into parameters by using 
instead 


UL \We(LT\\e(CT\\e(RT 


so the title comes from three strings called LT, 
CT and RT. If these are empty, then the utle 
will be a blank line. Normally CT would be set 
with something like 


.ds CT -%- 


to give just the page number between hyphens 
(as on the top of this page), but a user could 


supply private definitions for any of the strings. 


12. Conditionals 


Suppose we want the .SH macro to leave 
two extra inches of space just before section 1, 
but nowhere else. The cleanest way to do that is 
to test inside the .SH macro whether the section 
number is 1, and add some space if it is. The .if 
command provides the conditional test that we 
can add just before the heading line is output: 


if \\n(SH1 usp 2i \" first section only 


The condition after the if can be any 
arithmetic or logical expression. If the condition 
is logically true, or arithmetically greater than 
zero, the rest of the line is treated as if it were 
text - here a command. If the condition is 
false, or zero or negative, the rest of the line is 
skipped. 

It is possible to do more than one com: 
mand if a condition is true. Suppose several 
operations are to be done before section |. One 
possibility is to define a macro .S1 and invoke it 
if we are about to do section | (as determined by 
an .if). 


de SI 
--- processing for section | --- 


de SH 
if \\n(SH=1 .S1 


“9 


An alternate way is to use the extended 
form of the .if, like this: 


if \\n(SH=1 \{--- processing 
for section | ----\} 


The braces \{ and \} must occur in the positions 
shown or you will get unexpected extra lines in 
your output. troff also provides an ‘if-else’ con- 
struction, which we wiil not go into here. 


A condition can be negated by preceding it 
‘with !; we get the same effect as above (but less 
clearly) by using 


dif \\n(SH>1 SI 


There are a handful of other conditions 
that can be tested with .if. For example, is the 
current page even or odd? 


ife .tl “even page title” 
if'o .t! “odd page title’ 


gives facing pages different titles when used 


ae 


inside an appropriate new page macro. 


Two other conditions are t and na, which 
tell you whether the formatter is troff or nroff. 


Aft troff stuff ... 
fn nroff stuff ... 


Finally, string comparisons may be made 
in an .if: 


jf ‘stringl’string2’ stuff 


does ‘stuff’ if string/ is the same as string2. The 
character separating the strings can be anything 
reasonable that is not contained in either string. 
The strings themseives can reference strings with 
\*, arguments with \$, and so on. 


13, Environments 


As we mentioned, there is a potential 
problem when going across a page boundary: 
parameters like size and font for a page title may 
weil be different from those in effect in the text 
when the page boundary occurs. troff provides a 
very general way to deal with this and similar 
situations. There are three ‘environments’, each 
of which has independently settable versions of 
many of the parameters associated with process- 
ing, including size, font, line and title lengths, 
fill/nofill mode, tab stops, and even partially col- 
lected lines. Thus the titling problem may be 
readily solved by processing the main text in one 
environment and titles in a separate one with its 
own suitable parameters. 


The command .ev n shifts to environment 
n; n must be 0, | or 2. The command .ev with 
no argument returns to the previous environ- 
ment. Environment names are maintained in a 
stack, so calls for different environments may be 
nested and unwound consistently. 


Suppose we say that the main text is pro- 
cessed in environment 0, which is where troff 
begins by default. Then we can modify the new 
page macro .NP to process titles in environment 
1 like this: ’ 


.de NP 

ev i \" shift to new environment 
At 63 \" set parameters here 

ft R 

.ps 10 


... any other processing ... 
ev \" return to previous environment 


It is also possible to initialize the parameters for 
an environment outside the .NP macro, but the 
version shown keeps all the processing in one 
place and is thus easier to understand and 


change. 


14, Diversions 


There are numerous occasions in page lay- 
out when it is necessary to store some text for a 
period of time without actually printing it. Foot- 
notes are the most obvious example: the text of 
the footnote usually appears in the input well 
before the place on the page where it is to be 
printed is reached. In fact, the place where it is 
output normally depends on how big it is, which 
implies that there must be a way to process the 
footnote at least enough to decide its size 
without printing it. 


troff provides a mechanism called a diver- 
sion for doing this processing. Any part of the 
output may be diverted into a macro instead of 
being printed, and then at some convenient time 
the macro may be put back into the input. 


The command .di xy begins a diversion — 
all subsequent output is collected into the macro 
xy unul the command .di with no arguments is 
encountered. This terminates the diversion. 
The processed text is available at any time 
thereafter, simply by giving the command 


xy 


The vertical size of the last finished diversion is 
contained in the built-in number register dn. 


As a simple example, suppose we want to 
implement a ‘keep-release’ operation, so that 
text between the commands .KS and .KE will not 
be split across a page boundary (as for a figure or 
table). Clearly, when a .KS is encountered, we 
have to begin diverting the output so we can find 
out how big it is. Then when a .KE is seen, we 
decide whether the diverted text will fit on the 
current page, and print it either there if it fits, or 
at the top of the next page if it doesn’t. So: 


.de KS \" start keep 


.br \" start fresh line 
ev | \" collect in new environment 
fi \" make it filled text 


diXX \" collect in XX 


deKE \" end keep 


.br \" get last partial line 

di \" end diversion 

jf \\a(dn>=\\n(.t .bp \" bp if doesn’t fit 
nf \" bring it back in no-fill 

XX \" text 

ev \" return to normal environment 


Recail that number register ni is the current 


position on the cutput page. Since output was 
being diverted, this remains at its value when the 
diversion started. dn is the amount of text in 
the diversion; .t (another built-in register) is the 
distance to the next trap, which we assume is at 
the bottom margin of the page. If the diversion 
is large enough to go past the trap, the .if is 
Satisfied, and a .bp is issued. In either case, the 
diverted output is then brought back with .XX. It 
Is essential to bring it back in no-fill mode so 
troff will do no further processing on it. 


This is not the most general keep-release, 
nor is it robust in the face of all conceivable 
inputs, but it would require more space than we 
have here to write it in full generality. This sec- 
tion is not intended to teach everything about 
diversions, but to sketch out enough that you 
can read existing macro packages with some 
comprehension. 
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Appendix A: Phototypesetter Character Set 
These characters exist in roman, italic, and bold. To get the one on the left, type the four-character 


name on the right. 


f \(F fi \(fi 
\(ru — \(em 
\(co ° \lde 
\(rg @ \(bu 


’ 
> 


fl \(fl fi \(Fi 
Ye \(14 fe \(12 
t \(dg. '  \(fm 


O \(sq - \(hy 
(In boid, \(sq is @.) 


The following are special-font characters: 


\(pl 
\(eq 
\(=s 
\(ap 
\(-> 
\(is 
\(sb 
\Cib 
\(aa 
\(se 
\alt 
\ (ib 
\ (tk 
\(br 
\ (ss 


“NAM | inet 
“MNUY Teil 


SO A Pe: OR Cn 
am tinge “em ane ot 


& 


These four characters also have two-character names. The ‘ is the apostrophe on terminals; the ° 


other quote mark. 


\" 


These characters exist only on the speciai font, but they do not have four-character names: 


- 4 2 


For greek, precede the roman letter by \(* to get the corresponding greek; for example, \{*a is a. 


abgdezyhikimncoprstufxaqw 
eaBySeCnOtnrAupviowparrvdexype 


\(mi x \(mu 
\(= - 2 \(> = 
\(+- - \(no 
\C= = \(pt 
\(<- t \Cua 
\(pd ao \(if 
\(sp U ~ \tcu 
\(ip € \(mo 
\(ga oO \Gi 
\(dd ~ =\(Ih 
\(rt f \ile 
\(rb {= \aif 
\(rk } \ov 
\(or \(ul 


\ 


> 


4 


~ "  \ # 


@ 


ABGDEZYHIKLMNCOPRSTUFXQW 
ABPTAEZHOIKAMNEOMPILTY@X¥N 


fl \(FI 
% \(34 
¢ \(ct 


\(di 
\i<= 
\(sl 
\(gr 
\(da 
\(sr 
\(ca 
\(es 
\(bs 
\(rh 
\(re 
\ (ef 
\(ts 
\(rn 


is the 
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ABSTRACT 


This is the user’s guide for a system for typesetting mathematics, using the photo- 
typesetters on the UNIX and GCOS operating systems. 
Mathematical expressions are described in a language designed to be easy to use by people 


who know neither mathematics nor typesetting. Enough of the language to set in-line expres- 
sions like lim (tan x)*"?* = | or display equations like 


xwow{2 
S,2* hyp 
G(z) = eM Gls) ow axp 3 bad ee Tle /k 
k3t k 21 
S?2z? S,z* Siz‘ 
=(1+$,z+———+ - 1+——+ 3 
2! Ya a? 
k k k 
7 Si S3? _ S* a 
ky ky Kn H : 
m30) kyky....k, 20 1 'k,! 2 *k,! m ™k,,! 


can be learned in an hour or so. 


The language interfaces directly with the phototypesetting language TROFF, so mathemati- 
cal expressions can be embedded in the running text of a manuscript, and the entire document 
- produced in one process. This user’s guide is an example of its output. 


The same language may be used with the UNIX formatter NROFF to set mathematical 
expressions on DASI and GSi terminals and Model 37 teletypes. 
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1. Introduction 


EQN iS a program for typesetting 
mathematics on the Graphics Systems pho- 
totypesetters on UNIX and GCOS. The EQN 
language was designed to be easy to use by 
peopie who know neither mathematics nor 


typesetting. Thus EQN knows relatively little 


about mathematics. In particular, 
mathematical symbols like +, —-, X, 
parentheses, and so on have no special 
meanings. EQN ts quite happy to set garbage 
(but it will look good). 


EQN works as a preprocessor for the 
typesetter formatter, TROFF{1], so the nor- 
mai mode of operation is to prepare a docu- 
ment with both mathematics and ordinary 
text interspersed, and let EQN set the 
mathematics while TROFF does the body of 
the text. 


On UNIX, EQN will also produce 
mathematics on DASi and GSI terminals and 
on Model 37 teletypes. The input is identi- 
cal, but you have to use the programs NEQN 
and NROFF instead of EQN and TROFF. Of 


course, some things won't look as good . 


because terminals don’t provide the variety 
of characters, sizes and fonts that a 
typesetter does, but the output is usually 
adequate for proofreading. 


To use EQN on UNIX, 
eqn files | troff 


GCOS use tS discussed in section 26. 


2. Displayed Equations 


To tell EQN where a mathematical 
expression begins and ends, we mark it with 
lines beginning .—EQ and .—N. Thus if you 
type the lines 


.EQ 
X™yY+Z 
.EN 


your output will look like 
X=y+Z 


The .—Q and EN are copied through 
untouched; they are not otherwise processed 
by EQN. This means that you have to take 
care of things like centering, numbering, 
and so on yourself. The most common way 
is to use the TROFF and NROFF macro pack- 
age package ‘-ms’ developed by M. E. 
Lesk[3], which allows you to center, indent, 
left-justify and number equations. 


With the ‘—ms' package, equations are 
centered by default. To left-justify an equa- 
tion, use £QL instead of £Q. To indent it, 
use EQ!. Any of these can be followed by 
an arbitrary ‘equation number’ which will be 
placed at the right margin. For example, 
the input 


.EQ I (3.1a) 
x = f{y/2) + y/2 
.EN 


produces the output 
xa f(y/2)+y/2 (3.1a) 


There is alsc a shorthand notation so 
in-line expressions like 7, can be entered 
without £Q and EN. We will talk about it in 
section 19. 


3. Input spaces 


Spaces and newlines within an expres- 
sion are thrown away by EQN. (Normal text 
is left absolutely alone.) Thus between .£Q 
and .EN, 


x™y+Z 
and 

x™ytrz 
and 

x ™ y 

+z 


and so on all produce the same output 
Xmy-+Z 


You should use spaces and newlines freely 
to make your input equations readable and 
easy to edit. In particular, very long lines 
are a bad idea, since they are often hard to 
fix if you make a mistake. 


4. Output spaces 


To force extra spaces into the ouput, 
use a tilde ‘“*~”’ for each space you want: 


x" "y +°Z 
gives 
x™my +2 


Oe Soh A 


You can also use a circumflex , which 
gives a space half the width of a tilde. [t is 
mainly useful for fine-tuning. Tabs may 
also be used to position pieces of an expres- 
sion, but the tab stops must be set by TROFF 
commands. 


5. Symbols, Special Names, Greek 


EQN knows some mathematical sym- 
bois, some mathematical names, and the 
Greek alphabet. For example, 


x2 pi int sin ( omega t)dt 
produces 
x=2n fsin(ws) at 


Here the spaces in the input are necessary 
to tell EQN that int, pt, sin and omega are 
separate entities that should get special 
treatment. The sin, digit 2, and parentheses 


are set in roman type instead of italic: pi and 
emega are made Greek; and int becomes the 
integral sign. 

When in doubt, leave spaces around 
separate parts of the input. A very common 
error is to type /(pi) without leaving spaces 
on both sides of the pi. As a result, EQN 
does not recognize pi as a special word, and 
it appears as /(p/) instead of /(7). 


A complete list of EQN names appears 
in section 23. Knowledgeable users can also 
use TROFF four-character names for any- 
thing EQN doesn’t know about, like \(bs for 
the Bell System sign ©. 


6. Spaces, Again 


The only way EQN can deduce that 
some sequence of letters might be special is 
if that sequence is separated from the letters 
on either side of it. This can be done by 
surrounding a special word by ordinary 
spaces (or tabs or newlines), as we did in 
the previous section. 


You can also make special words stand 
out by surrounding them with tildes or 
circumflexes: 


x” = °2°piint sin” (omega t”) “dt 
is much the same as the last example, 
except that the tildes not only separate the 


magic words like sim, omega, and so on, but 
also add extra spaces, one space per tilde: 


xadafsin(wr) dt 


Special words can also be separated by 
braces { } and double quotes ”...", which 


have special meanings that we will see soon. 


7. Subscripts and Superscripts 


Subscripts and superscripts 
obtained with the words sué and sup. 


are 


x sup 2 + y sub k 
gives 
x?+y, 
EQN takes care of ail the size changes and 
vertical motions needed to make the output 
look right. The words sud and sup must be 


surrounded by spaces, x sud2 will give you 
xsub2 instead of x». Furthermore, don't 


forget to leave a space (or a tilde, etc.) to 
mark the end of a subscript or superscript. 
A common error ts to say something like 


y = (x sup 2) +1 


which causes 


yan(x2)+! 


instead of the intended 
y= (x2) +1 
Subscripted subscripts and  super- 
scripted superscripts also work: 


x sub i sub | 


x, 


A subscript and superscript on the same 
thing are printed one above the other if the 
subscript comes first: 


x subi sup 2 


x? 


Other than this special case, swb and 
sup group to the right, so x sup y subz 
means x", not x". 


8. Braces for Grouping 

Normally, the end of a subscript or 
Superscript is marked simply by a blank (or 
tab or tilde, etc.) What if the subscript or 
superscript is something that has to be typed 
with blanks in it? In that case, you can use 
the braces { and } to mark the beginning and 
end of the subscript or superscript: 


e sup {i omega ¢} 


tual 
e 


Rule: Braces can always be used to force 
EQN to treat something as a unit, or just to 
make your intent perfectly clear. Thus: 


x sub {i sub 1} sup 2 


" 


with braces, but 


x subi sub | sup 2 


xX) 
‘y 


which is rather different. 


Braces can occur within braces if 
necessary: 


e sup {i pi sup (rho +1}} 


! rer 
4 


The general rule is that anywhere you could 
use some single thing like x, you can use an 
arbitrarily complicated thing if you enclose it 
in braces. EQN will look after all the details 
of positioning it and making it the right size. 

In all cases, make sure you have the 
right number of braces. Leaving one out or 
adding an extra will cause EQN to complain 
bitterly. 

Occasionally you will have to print 
braces. To do this, enclose them in double 
quotes, like "{". Quoting is discussed in 
more detail in section 14. 


9. Fractions 
To make a fraction, use the word over: 


a+b over 2c =| 


gives 
a+b 
2c 
The line is made the right length and posi- 


tioned automatically. Braces can be used to 
make clear what goes over what: 


=| 


(alpha + beta} over {sin (x)] 


a+B 


sin(x) 


What happens when there is both an over 
and a sup in the same expression? In such 
an apparently ambiguous case, EQN does the 
sup before the over, so 


—b sup 2 over pi 


—p> . 
is —— instead of —b 
T 


4 Jew 


The rules which 


decide which operation is done first in cases 
like this are summarized in section 23. 
When in doubt, however, use braces to 
make clear what goes with what. 


10. Square Roots 
To draw a square root, use sqrt: 
sqrt a+b + 1 over sqrt {ax sup 2 +bx+c} 
iS 
ee 
Vax?+bx+¢ 


Warning — square roots of tall quantities 
look lousy, because a root-sign big enough 
to cover the quantity is too dark and heavy: 


a+b + 


sqrt {a sup 2 over b sub 2} 


Jt 
JE 


Big square roots are generally better written 
as something to the power |: 


(a2/b,)” 
which is 
(a sup 2 /b sub 2 ) sup half 


11. Summation, Integral, Etc. 


Summiations, integrals, and _ similar 


constructions are easy: 
sum from i=0 to {i= inf} x sup i 


produces 


, ai) « 


Notice that we used braces to indicate where 
the upper part :=°co begins and ends. No 
_ braces were necessary for the lower part 
:=Q, because it contained no blanks. The 
braces will never hurt, and if the /rom and to 
parts .contain any blanks, you must use 
braces around them. 


The from and to parts are both 
optional, but if both are used, they have to 
occur in that order. 


Other useful characters can replace the 
sum in our example: 


int prod union inter 
become, respectively, 


Since the thing before the /rom can be any- 
thing, even something in braces, /rom-to can 
often be used in unexpected ways: 


lim from ({n —> inf} x sub n.=0 


lim x, =0 


12. Size and Font Changes 


By default, equations are set in 10- 
point type (the same size as this guide), 
with standard mathematical conventions to 
determine what characters are in roman and 
what in italic. Although EQN makes a vali- 
ant attempt to use esthetically pleasing sizes 
and fonts, it is not perfect. To change sizes 
and fonts, use size n and roman, italic, bold 
and fat. Like sub and sup, size and font 
changes affect only the thing that follows 
them, and revert to the normal situation at 
the end of it. Thus 


bold x y 
is 
Xy 
and 
size 14 bold x = y + 
size 14 {alpha + beta] 
gives 


X=y+at+B 


As always, you can use braces if you want to 
affect something more complicated than a 
single letter. For example, you can change 
the size of an entire equation by 


size 12 {... | 


Legal sizes which may follow size are 
6, 7, 8, 9, 10, 11, 12, 14, 16, 18, 20, 22, 24, 
28, 36. You can also change the size by a 
given amount. for example, you can say 
size +2 to make the size two points bigger, 


or size ~3 to make it three points smailer. 
This has the advantage that you don’t have 
to know what the current size is. 


If you are using fonts other than 
roman, italic and bold, you can say font X 
where X is a one character TROFF name or 
number for the font. Since EQN is tuned for 
roman, italic and bold, other fonts may not 
give quite as good an appearance. 


The /ar operation takes the current 
font and widens it by overstriking: fat grad is 
V and fat (x sud ij is x,.' 


If an entire document is to be in a 
non-standard size or font, it is a severe nui- 
Sance to have to write out a size and font 
change for each equation. Accordingly, you 
can set a ‘‘global’’ size or font which 
thereafter affects all equations. At the 
beginning of any equation, you might say, 
for instance, 


.EQ 
gsize 16 
gfont R 


EN 


to set the size to 16 and the font to roman 
thereafter. In place of R, you can use any 
of the TROFF font names. The size after 
gsize can be a relative change with + or —. 


Generally, gsize and gfont will appear at 
the beginning of a document but they can 
also appear thoughout a document: the glo- 
bal font and size can be changed as often as 
needed. For example, in a footnotet you 
will typically want the size of equations to 
match the size of the footnote text, which is 
two points smaller than the main text. 
Don't forget to reset the global size at the 
end of the footnote. 


tLike this one. in which we have a few random 
expressions like x, and w*. The sizes for these 
were set by the command esize — 2. 


13. Diacritical Marks 


To get funny marks on top of letters, 
there are several words: 


x dot 

x dotdot 
x hat 

x tilde 

x vec 

x dyad 
x bar 

x under 


I> RE Sf Of Sen dep der ds. 


The diacritical mark is placed at the right 
height. The dar and under are made the 
right length for the entire construct, as in 
x+y +z, other marks are centered. 


14. Quoted Text 


Any input entirely within quotes 
("...") is mot subject to any of the font 
changes and spacing adjustments normally 
done by the equation setter. This provides a 
way to do your own spacing and adjusting if 
needed: 


italic “sin(x)" + sin (x) 


sinCe) +sin(x) 


Quotes are also used to get braces and 
other EQN keywords printed: 


"{ size alpha }" 


is 
( size alpha } 
and 
roman "{ size alpha }" 
is 


{ size alpha } 


ee 


The construction “" is often used as a 
place-holder when grammatically EQN needs 
something, but you don't actually want any- 
thing in your output. For example, to make 
2He, you can't just type sup 2 roman He 
because a sup has to be a superscript on 


something. Thus you must say 


"" sup 2 roman He 


To get a literal quote use ‘*\""’. TROFF 
characters like \(bs can appear unquoted, 
but moré complicated things like horizontal 
and vertical motions with \A@ and \v should 
always be quoted. (If you’ve never heard of 
\A and \v, ignore this section.) 


15. Lining Up Equations 


Sometimes it’s necessary to line up a 
series of equations at some horizontal posi- 
tion, often at an equals sign. This is done 
with two operations called mark and lineup. 


The word mark may appear ance at 
any place in an equation. It remembers the 
horizontal position where it appeared. Suc- 
cessive equations can contain one 
occurrence of the word /ineup. The place 
where /ineup appears is made to line up with 
the place marked by the previous mark if at 
all possible. Thus, for example, you can say 


EQ | 

x+y mark = z 
.EN 

EQ | 

x lineup = | 
.EN 


to produce 
X+y™Z 
x=] 


For reasons too complicated to talk about, 
when you use EQN and ‘—ms’', use either 
£Q!oor £QL. mark and /ineup don’t work 
with centered equations. Also bear in mind 
that sark doesn't look ahead: 


x mark =] 


x+y lineup =z 


isn't going to work, because there isn’t 
room for the x+y part after ne mark 
remembers where the x is. 


16 Big Brackets, Etc. 


To get big brackets [], braces {}, 
parentheses (), and bars || around things, 
use the /eft and right commands: 


left {a over b + 1 right } 
“left (c over d right ) 
+ left [ e right | 


Fa +e| 


The resulting brackets are made big enough 
to cover whatever they enclose. Other char- 
acters can be used besides these, but the are 
not likely to look very good. One exception 
is the floor and ceiling characters: 


left floor x over y right floor 
<= left ceiling a over b right ceiling 


produces 


Several warnings about brackets are in 
order. First, braces are typically bigger than 
brackets and parentheses, because they are 
made up of three, five, seven, etc., pieces, 
while brackets can be made up of two, 
three, etc. Second, big left and right 
parentheses often look poor, because the 
character set is poorly designed. 


The right part may be omitted: a ‘“‘left 
something'’ need not have a corresponding 
‘‘right something’. If the right part is omit- 
ted, put braces around the thing you want 
the left bracket to encompass. Otherwise, 
the resulting brackets may be too large. 


If you want to omit the /eff part, things 
are more complicated, because technicaily 
you can't have a right without a correspond- 
ing /eft. Instead you have to say 


left °" 


for example. The /eft"" means a ‘‘left noth- 
ing’’. This satisfies the rules without hurt- 
ing your output. 


right ) 


17. Piles 


There is a generai facility for making 
vertical piles of things; it comes in several 
flavors. For example: 


A ~=" left [ 

pile { a above b above ¢ } 

-- pile { x above y above z } 
right } 


will make 


Am 


a ; 
by 
c 2 
The elements of the pile (there can be as 
many as you want) are centered one above 
another, at the right height for most pur- 
poses. The keyword above is used to 
separate the pieces; braces are used around 
the entire list. The elements of a pile can 
be as complicated as needed, even contain- 
ing more piles. 

Three other forms of pile exist: /pile 
makes a pile with the elements left-justified; 
rpile makes a right-justified pile; and cpile 
makes a centered pile, just like pile. The 
vertical spacing between the pieces is some- 
what larger for /-, r- and cpiles than it is for 
Ordinary piles. 


roman sign (x)" =" 
left { 
Ipile {1 above 0 above —1} 
~” Ipile 
(if-x >0 above if-x=0 above if-x <0} 


makes 


1 wWx>0 
sign(x) = {0 if x=0 
—1 if x<0 


Notice the left brace without a matching 
right one. 


Matrices 

It is also possible to make matrices. 
For example, to make a neat array like 

2 


18. 


x, x 


y y? 


you have to type 


matrix { 
ccol { x subi above y sub i } 
ccol | x sup 2 above y sup 2 | 


This produces a matrix with two centered 
columns. The elements of the columns are 
then listed just as for a pile, each element 
separated by the word adove. You can also 
use /col or rcof to left or right adjust 
columns. Each column can be separately 
adjusted, and there can be as many columns 
as you like. 


The reason for using a matrix instead 
of two adjacent piles, by the way, is that if 
the elements of the piles don’t all have the 
same height, they won't line up properly. A 
matrix forces them to line up, because it 
looks at the entire structure before deciding 
what spacing to use. 


A word of warning about matrices — 
each column must have the same number of 
elements in it. The world will end if you get 
this wrong. 


19. Shorthand for In-line Equations 


In a mathematical document, it is 
necessary to follow mathematical conven- 
tions not just in display equations, but also 
in the body of the text, for example by mak- 
ing variable names like x italic. Although 
this could be done by surrounding the 
appropriate parts with £Q and EN, the con- 
tinual repetition of .£Q and EN is a nuisance. 
Furthermore, with ‘—ms’, .£Q and .EN imply 
a displayed equation. 


EQN provides a shorthand for short in- 
line expressions. You can define two char- 
acters to mark the left and right ends of an 
in-line equation, and then type expressions 
right in the middie of text lines. To set 
both the left and right characters to doilar 
signs, for example, add to the beginning of 
your document the three lines 


.EQ 
delim $$ 
.EN 


Having done this, you can then say things 
like 


Let Salpha sub i$ be the primary 
variable, and let Sbeta$ be zero. 
Then we can show that $x sub 15 is 
$>=0$S. 


This works as you might expect — spaces, 
newlines, and so on are significant in the 
text, but not in the equation part itself. 
Multipie equations can occur in a single 
input line. 

Enough room is left before and after a 
line that contains in-line expressions that 

A 

something like } x, does not interfere with 


r= 


the lines surrounding it. 
To turn off the delimiters, 
.EQ 


delim off 
.EN 


Warning: don't use braces, tildes, 
circumflexes, or double quotes as delimiters 
— chaos will result. 


Cd 


20. Definitions 


EQN provides a facility so you can give 
a frequently-used string of characters a 
name, and thereafter just type the name 
instead of the whole string. For example, if 
the sequence 


x subisub | + y subi sub | 


appears repeatedly throughout a paper, you 
can save re-typing it each time by defining it 
like this: 


define xy °x subisub 1! + y subi sub 1’ 


This makes xy a shorthand for whatever 
characters occur between the single quotes 
in the definition. You can use any character 
instead of quote to mark the ends of the 
definition, so long as it doesn't appear inside 
the definition. 


Now you can use xy like this: 


EQ 
f(x) = xy... 
.EN 
and so on. Each occurrence of xy will 


expand into what it was defined as. Be care- 
ful to leave spaces or their equivalent 


around the name when you actually use it, 
SO EQN will be able to identify it as special. 


There are several things to watch out 
for. First, although definitions can use pre- 
vious definitions, as in 


.EQ 

define xi °x subi’ 
define xil * xisubl’ 
.EN 


don't define something in terms of itself A 
favorite error is to say 


define X ° roman X ' 


This is a guaranteed disaster, since X is now 
defined in terms of itseif. If you say 


define X ° roman "X"’ 


however, the quotes protect the second X, 
and everything works fine. 


EQN keywords can be redefined. You 
can make / mean over by saying 


define / * over’ 
or redefine over as / with 


define over */° 


If you need different things to print on 
a terminal and on the typesetter, it is some- 
times worth defining a symbol differently in 
NEQN and EQN. This can be done with 
ndefine and tdefine. A definition made with 
ndefine only takes effect if you are running 
NEON; if you use ‘define, the definition only 
applies for EQN. Names defined with plain 
define apply to both EQN and NEQN. 


21. Local Motions 


Although EQN tries to get most things 
at the right place on the paper, it isn’t per- 
fect, and occasionally you will need to tune 
the output to make it just right. Small extra 
horizontal spaces can be obtained with tilde 
and circumflex. You can also say Sack n and 


fwd nto move small amounts horizontally. 


n is how far to move in 1/100’s of an em 
(an em is about the width of the letter ‘m’.) 
Thus back 50 moves back about half the 
width of an m. Similarly you-can move 
things up or down with up nand down n. As 
with swb or sup, the local motions affect the 


next thing in the input, and this can be 
something arbitrarily complicated if it is 
enclosed in braces. 


As an example of local motions, con- 
sider tucking the limits in under an integral 
sign. Normally if you say 


int sub 0 sup 1 


ie 
0 


which is awful. The intuitively appealing 


it looks like 


int from 0 to 1 


f 


0 
which is not normally used. But if you say 
int sub back 40 down 50 0 sup up 30 1 


fi 
0 
(These values are determined experimen- 


tally.) Of course this is a nuisance to type, 
so you would first make definitions like this: 


tdefine lower °* sub back 40 down 50° 
tdefine upper ° sup up 30° 


you get 


and then say 


int lower 0 upper | 


22. A Large Example 


Here is the compiete source for the 
three display equations in the abstract of this 
guide. 


EQ! 

Giz) "mark =" e sup { In ~ Giz) | 

"=" exp left ( 

sum from k>==1 {S sub k z sup k! over k right ) 
“= ~° prod from k>=1 e sup (S sub k z sup k /k} 
EN 

EQ 1 

ineup = left (} + Ssubl z+ 

{S sub | sup 2 z sup 2 } over 2! + ... right ) 
left ( 1+ {S sub 2 z sup 2 | over 2 

+ {S sub 2 sup 2 z sup 4 } over { 2 sup 2 cdot 2! | 
+ right) ... 

.EN 

EQ! 


lineup = sum from m> =0 left ( 


sum from 
pile | k sub! .k sub2. ..ksubm >=0 
ubove 

k sub | #2k sub? > +mk sub m =m} 
{S sub 1 sup [kh sun 1} } over (t sup k sub 1 k subd! > 
{ S sub 2 sup {k sub 2} } 


(S sub m sup [k sub ni} 
right }) z sup m 
.EN 


23. Keywords, Precedences, Etc. 


If you don’t use braces, EQN will do 
operations in the order shown in this list. 


dyad vec under bar tilde hat dot dotdot 
fwd back down up 

fat roman italic bold size 

sub sup sqrt over 

from to 


These operations group to the left: 
over sqrt left right 


All others group to the right. 


Digits, parentheses, brackets, punctua- 
tion marks, and these mathematical words 
are converted to Roman font when encoun- 
tered: 


sin cos tan sinh cosh tanh arc 
max min lim log In exp 
Re Im and if for det 


These character sequences are recognized 
and translated as shown. 


> = 
<= 
re 
+ pan 
— > 
<— 
<< 
>> 
inf 
partial ] 
half 
prime 
approx 
nothing 
cdot 
umes 
del 


TJHHMAW 


<] * 


over {2 sup k sub 2 k sub 2! | - 


|} over {m sup k submk subm ! | 


grad V 
sum ae 
int f 
prod II 
union oe 
inter ‘an 


To obtain Greek letters, simply spell 
them out in whatever case you want: 


DELTA A iota t 
GAMMA [ kappa K 
LAMBDA A lambda N 
OMEGA mu Ub 
PHI } nu v 
PI IT omega w 
PSI Wy omicron. o 
SIGMA 2 phi rr) 
THETA 98 pi 7 
UPSILON Y psi w 
XI = rho p 
alpha a sigma o 
beta B tau tT 
chi x theta 9 
delta 5 upsilon v 
epsilon € xi é 
eta n zeta ¢ 
gamma y 


These are ail the words known to EQN 
(except for characters with names), together 
with the section where they are discussed. 


above 17, 18 Ipile 17 
back 21 mark 15 
bar 13 matrix 18 
bold 12 ndefine 20 
ccol 18 over 9 

col 18 pile 17 
cpile 17 rcol 18 
define 20 right 16 
delim 19 roman 12 
dot 13 rpile 17 
dotdot 13 size 12 
down 2) sqrt 10 
dyad 13 sub 7 

fat 12 sup 7 

font 12 tdefine 20 
from 11 tilde 13 


twd 21 to 11 
gfont 12 under 13 
gsize 12 up 21 
hat 13 vec 13 
italic 12 ee 4,6 
Icol 18 { } 8 
left 16 "ee 8, 14 
lineup 15 


24. Troubleshooting 


If you make a mistake in an equation, 
like leaving out a brace (very common) or 
having one too many (very common) or 
having a sup with nothing before it (com- 
mon), EQN will tell you with the message 


syntax error between lines x and y, fle z 


where x and y are approximately the lines 
between which the trouble occurred, and z is 
the name of the file in question. The line 
numbers are approximate — look nearby as 
well. There are also self-explanatory mes- 
sages that arise if you leave out a quote or 
try to run EQN on a non-existent file. 


If you want to check a document 
before actually printing it (on UNIX only), 


eqn files >/dev/nuil 


will throw away the output but print the 
messages. 


If you use something like doilar signs 
as delimiters, it is easy to leave one out. 
This causes very strange troubles. The pro- 
gram checkeg (on GCOS, use _ ./checkeq 
instead) checks for misplaced or missing 
dollar signs and similar troubles. 


In-line equations can only be so big 
because of an internal buffer in TROFF. If 
you get a message ‘‘word overflow’’, you 
have exceeded this limit. If you print the 
equation as a displayed equation this mes- 
sage will usually go away. The message 
‘line overflow’ indicates you have 
exceeded an even bigger buffer. The only 
cure for this is to break the equation into 
two separate ones. 


On a related topic, EQN does not break 
equations by itself — you must split long 
equations up across multiple lines by your- 
self, marking each by a separate EQ ... .EN 


sequence. EQN does warn about equations 
that are too long to fit on one line. 


25. Use on UNIX 


To print a document that contains 
mathematics on the UNIX typesetter, 


eqn files | troff 


If there are any TROFF options, they go after 
the TROFF part of the command. For exam- 
ple, 


eqn files | troff —ms 


To run the same document on the GCOS 
typesetter, use 


eqn files | troff —g (other options) | gcat 


A compatible version of EQN can be 
used on devices like teletypes and DAS! and 
GSI terminals which have half-line forward 
and reverse capabilities. To print equations 
on a Mode! 37 teletype, for example, use 


neqn files | nroff 


The language for equations recognized by 
NEQN is identical to that of EQN, although of 
course the output is more restricted. 


To use a GS! or DAS! terminal as the 
output device, 


neqn files | nroff -Tx 


where x is the terminal type you are using, 
such as 300 or 300S. 


EQN and NEQN can be used with the 
TBL program([2} for setting tables that con- 
tain mathematics. Use TBL before [NJEQN, 
like this: 


tbi files | eqn | troff 
tbl files | neqn | nroff 


26. Use on GCOS 


This space intentionally left blank 
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ABSTRACT 

There is now available on UNIX and GCOS a set of special characters fre- 
quently used in technical typing. In the past, authors have sometimes written 
out these symbols in English: others just assumed their secretary or typist had 
these symbols ready and waiting. These characters, however, are not part of 
the standard terminal or typesetter character sets, but are built-up of those 
already available. They can presently be produced for phototypesetter output 
by using EQN/TROFF, NEQN/NROFF can be used for computer terminal out- 
put. 


This document displays these characters, shows how to use them, and 
discusses what is involved in making a special character. 
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New Graphic Symbols for EQN and NEQN | 


Carmeia Scrocca 


Bell Laboratories 
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Introduction 


There is now available on UNIX and GCOS a set of special characters frequently used in 
technical typing. These characters supplement the ones that come with the typesetter and ter- 
minal which both have their own set of standard characters. These special characters are 
accessed through the math typesetting programs NEQN and EQN.! Processed through the 
NROFF and TROFF? formatting programs, the characters can be output on either a computer 
terminal or a phototypesetter. Using various NROFF/TROFF conventions, two or more of 
these existing characters can be built-up and pieced together to draw a new character. 


Sections | and 2 of this document give a list of these characters and tell how to access and 
use them. In Sections 3 and 4, the reader will see what is involved in making a special charac- 
ter for both phototypesetter and computer terminal output. 


1. The Characters 


Table 1 gives a list of the characters, their meanings, and the names by which EQN recog- 
nizes them. 


The user should be aware that these special characters are not built into the NEQN/EQN 
program, but are stored in directory /usr/pub; the filename is egnchar. In order to use any of 
these symbols this file will have to be referenced. This can be done by using file egnchar as the 
first filename in your output command, such as 


neqn /usr/pub/eqnchar filenames | nroff. 


for computer terminal output. For phototypesetter output use EQN/TROFF instead of 
NEQN/NROFF. On GCOS, the characters are in file ./eqnchar. 


Some users will find a constant need for only a few of the special characters. It may be 
convenient for these users to take their few selected characters, and write them into a file in 
their own directory. Other users may only need a few characters for use in one particular docu- 
ment. They could edit /usr/pub/eanchar, copy the desired character definitions into a separate 
file, and read that file into the beginning of their document file. 


Appendices la and 1b provide additional characters with their corresponding EQN names. 
The characters list in Appendix la are from the phototypesetter character set and have been 
assigned EQN names. /usr/pub/eqnchar must be referenced in the output command to use 
characters in this set. Appendix 1b contains a list of characters already built into EQN and can 
be used directly with the program. 


sum of two elements @ ciplus 
product of two elements 2 citimes 
is congruent to = = wig 
approximately equal to as ==dot 
equals by definition 2 =del 
large star * bigstar 
centered star a star 

or V orsign 
and AY andsign 
for all ¥Y | oppA 
there exists a oppE 

is included in c incl 

not a member of g nomem 
angstrom A angstrom 
less than or approximately equal to < <wig 
greater than or approximately equal to > > wig 
not less than < l< 

not greater than > [> 

left angle bracket ( langle 
right angle bracket ) rangle 
hbar ii hbar 
parailel i {| 
perpendicular L ppd 
angle L ang 
right angle Le rang 
implies and is implied by _ <-> 
implies and is implied by <> <=> 
vertical ellipsis 3dot 
therefore thf 


character name character | EQN name 


Table |. The Special Characters 


2. Usage 


The reader should be familiar with EQN as these characters work with the program and 
are used in the same manner as any other mathematical symbol. An EQN name must be 
separated from surrounding input (with spaces) in order to be recognized as a special character. 
For example, just as you would say 


pi ™ “sum xX sup t 


to get 


w= >} x! 


you could say 
X sub 3 “= wig” pi star y sub , pd 


to give you 
X3 = TY, 


of which wig, star and ppd are a few of the new special characters. These symbols will work 
in both displayed and in-line equations. There are no bold or italic versions of the symbols. 


3. Creating Special Characters for EQN 


The symbols discussed here were created by taking pre-existing characters and piecing 
them together with the input and output conventions. and escape sequences made available 
through NROFF and TROFF. Most commonly used here were the local horizontal and vertical 
motions, overstrike and zero width functions, and the point-size change function. These are 
used primarily for phototypesetter output; making characters for computer terminal output will 

be discussed in Section 4. The definitions of all the special characters are listed in Appendix 2. 


Local horizontal and vertical motions are used to move a character up, down, or to the 
left or right depending on where you want to position your character. These motions are gen- 
erated by the escape sequences \u, \d, \r, \v and \h. The motions are expressed in terms of 
ems; an em is approximately the width of the letter ‘m’. By using ems, the amount of motion 
will always be in proportion to character size. The \u and \d sequences give vertical motions of 
¥2 em up and down, respectively; \r gives an upward motion of 1 em. The \v (vertical motion) 
and \h (horizontal motion) escape sequences allow you to move any fraction of an em. The 
distance must be enclosed in ' marks and the direction of movement can be indicated by mak- 
ing it either positive or negative. For example, if you wanted a downward vertical motion of 
3/10 of an em, you would say \v'.3m’, and an upward vertical motion of 6/10 of an em would 
be \v’'—.6m’. The same basic rules apply to horizontal motions where positive moves to the 
right, and negative to the left. The ‘“‘is much greater than’’ symbol >> shows a simple hor- 
izontal motion: 


>\h'—.3m'> 


(>> is not a special character, but built into EQN; see Appendix 1b.) 


The overstrike function \o simply overprints characters on top of another, centered on the 
widest character. This function was used to create the ‘“‘sum of two elements’? symboi @ by 
saying 


\o'\(pl\ (ci’ 


where \(pl and \(ci are the escapes for + and ©. The string of characters to be overstruck must 
be enclosed in ‘ marks. When using the overstrike function be sure not to use any motions or 
it will not work. 


Similar to the overstrike function is the bracket building function \b. Instead of centering 
one character on the other, the bracket building function piles the characters vertically. The 
‘tangle brackets’? were built-up using the \b function: 

\s—3\b\ (si\e'\sO ( 


\s—3\b'\e\ (si'\s0 ) 


(\e and \(si are the escapes for \ and /.) The first character in the string is positioned at the top 
of the pile, and on down with the last character at the bottom. \s is the escape sequence for a 
size change; we'll get to this in a while. 


The zero width function \z enables you w print a character without moving after it is 
printed. \z often makes it easier to position your next character rather than figuring out where 
the first one moved you to. This function is used with \zn where nis the character to be 
printed. \z can only be applied to one character at a time so the sequence would have to be 
repeated. An example of this can be the definition of the ‘‘vertica! ellipsis’’ - where \z forces 
the dots to remain in place; otherwise, horizontal motions would be needed to realign the dots. 


\v'=.8m\z.\v'.5m‘\z.\v'.Sm.\v'—.2m’ 


The zero width function can also be used to darken a character: \zxx will print an x stay in 
place, and print it again. 

The point-size change function is used frequently to make one character fit in proportion 
to another. The escape sequence for this function is \s. The change in size would be indicated 
by however many point sizes you want to change. \s-—-2 will cause a reduction of 2 point sizes 
while \s+2 will enlarge by 2 point sizes. For example, in making the “equals by definition’’ 
symbol 3, the A was made slightly smailer to fit comfortably over the =. This was done by 


\v’.3m\z=\v'—.6m'\h’ 3m\s~—1\CD\s+ l\v'.3m’ 


hence reducing the A (\("D is the escape for this) by | point size before printing it, and then 
returning back to the previous size. \s0 can aiso be used to bring you back to the previous size. 


Another handy tool is the font change function \f. Since EQN automatically sets its char- 
acters in italics, it will be necessary to specify any other font. To change fonts, use \fx where x 
is the desired font (R for roman, B for boid, I for italic, etc.). \fP will revert back to the previ- 
ous font. A prime example of this would be the ‘‘angstrom’’ symbol A: 


\fR\ZA\v'—.3m\h’.2m‘\(de\v’.3m'\fP\h’.2m’ 


where the capital ‘‘A’’ is always printed in the Roman font. 


A few words of caution. If in building your character you have used vertical motions or 
point-size or font changes, you must remember to undo them, or whatever follows will be off 
the main line or in the wrong size or font. It may also be necessary at times to use a horizontal 
motion to ensure enough space before and after your character. Also, you should test your 
character before putting it to actual use. Try it out with changes in original font or point size 
and see how it reacts to the change; don’t be surprised if your character falls apart. 


When you are satisfied with your character and it is ready to use, introduce it into your 
file with the define facility provided by EQN. For example, you would define the ‘‘not greater 
than’’ symbol (>) as 


define [> % "\0’>\(or” % 


Now by using |> in an equation, you will get >. 


In the previous example, the %’s and "’s are two different kinds of delimiters. The out- 
side pair are essential to mark the beginning and end of the definition. (%’s were used, 
although any character will do as long as it is not used in the definition.) Also, any definition 
containing TROFF commands should be enclosed in ” marks, so EQN will treat the TROFF 
commands as a unit. 


4. Creating Special Characters for NEQN 


To get a special character to print out on a computer terminal is not quite as involved as 
on the phototypesetter. You are restricted to working with only the characters available on the 
print wheel and movement is limited. Vertical motion can be obtained by \u and \d but this 
will only give you a motion of % line space per escape sequence. Spacing and backspacing is 
about ail the horizontal motion you’ll get. You can, however, use NEQN to define a character 
as was done with the “‘less than or approximately equal to’’ symbol <: 


ndefine <wig % < from ~" % 


All TROFF conventions work in NROFF, but because of the lack of characters, they may 
produce strange effects when using NEQN/NROFF. Therefore, it may be necessary to 
separately define characters for phototypesetter and computer terminal output. «define applies to 
definitions for EQN; ndefine works with NEQN. define uses the same definition for both. 


Built-up characters for terminal output are usually just good enough for identification’s 
sake; generally, they look lousy. However, given the time and additional effort, these charac- 
ters can be refined so that their output on the terminal is quite satisfactory. 
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Appendix la 


Additional Symbols from Phototypesetter Character Set 


character 


quarter 
3quarter 
degree 
square 
circle 
blot 
bullet 
wig 

prop 
empty 
member 
cup 

cap 
subset 
supset 
lsubset 
lsupset 


* # & & & 


UWINUNDCMwmMHAA) Le wOd 


* NEQN/NROFF does not produce these symbois. 


Appendix 1b 


Additional Symbols Provided by EQN 


character 


half 
approx 
>= 
< = 
>> 
<< 
=> 


SHH PEI AYAVIUS 


gx - 


* NEQN/NROFF does not produce these symbois. 


Appendix 2 
Nore: = represents a backspace in ndefined characters. 


EQ 

define ciplus % “\o'\(pi\(ci’” % 

ndefine ciplus % O--+ % 

tdefine citimes % “\o'\(mu\(ci" % 

ndefine citimes % O--x % 

tdefine =wig % “\(eq\h’~\w'\ (eq’u—\w'\s—2\ (ap’u/2u'\v' —.4m'\s—2\z\ (ap\ (ap\s + 2\v'.4m'\h\w'\ (eq’u—\w'\s—2\ (ap'u/2u" % 
ndefine =wig % =—""% 

tdefine bigstar % “\o'\(pi\(mu'" % 

ndefine bigstar % X——— % 

tdefine =dot % \z\(eq\v'—.6m'\h’.2m\s+2.\s—2\v'.6m'\h’.1m™" % 
ndefine =dot % = dot % 

tdefine orsign % “\s~2\v'—.15m'\z\e\e\h'—.05m\z\ (si\(si\v’.15m'\s + 2° % 
ndefine orsign % \e/ % 

tdefine andsign % \s—2\v'—.15m'‘\z\(si\(si\h’=.05m‘\z\e\e\v’.15m'\s+ 2" % 
ndefine andsign % /\e % 

tdefine =del % “\v3m\z=\v'~.6m\h’.3m'\s—1\(*D\s + i\v'.3m" % 
ndefine del % = to DELTA % 

tdefine oppA % “\s—2\v —. L5m‘\z\e\e\h'~.05m'\z\ (si\ (si\v'—. 15m\h' =. 75m \z—\z—\h'. 2m \z—\2—\v' 3m\h4m'\s +2" % 
ndefine oppA % V-~— % 

tdefine oppE %\s—3\v .2m1'\z\(em\v' —.5m'\z\ (em\v —.5i1'\z\ (em\v'.5Sem‘\ hr. 9en'\z\ (bryz\ (be\v'.25m'\s + 3" % 
ndefine oppE % E=~/ % 

tdefine inci % “\s—I\z\(or\h’~.1m'\v ~.45m'\z\ (em\v’.7m‘\z\ (em\v'.2m‘\ (em\v'=.45m \s+ 1" % 
ndefine incl % C—_ % 

tdefine nomem % “\o'\(mo\(si'" % 

ndefine nomem % C———/ % 

tdefine angstrom % “\fR\zA\v'~.3m\h’.2m‘\(de\v' 3m\fP\h’.2m” % 
ndefine angstrom % A too % 

tdefine star %{ roman "\v’5m'‘\s+3%\s—3\v'—.5m""|% 

ndefine star % * % 

tdefine || % \(or\(or % 

define <wig % \z<\v'.4m‘\(ap\v'~.4m" % 

ndefine <wig %{ < from "~~ }% 

tdefine >wig % %\z>\v'.4m'\(ap\v'—.4m’" % 

ndefine >wig %l > from ““" }% 

idefine langle % “\s—3\b'\(si\e'\s0" % 

ndefine langie ‘4<% 

define rangle % “\s—3\b'\e\(si'\s0” % 

ndefine rangle 4>% 

tdefine hbar % “\zh\v'—.6m'\h’.05m\(ru\v’.6m’”" % 

ndefine hbar % h=\u—\d % 

ndefine ppd % =| % 

tdefine ppd % “\o'\(ru\s—2\(or\s+ 2 % 

tdefine <-> % “\0'\(<~\(—->" % 

adefine <=> % *"<——>"% 

tdefine <=> % \s—2\2<\v'.05m‘\h’.2m\z—\h'.55m' =\h'—.6m\v'—.05m' >\s+ 2” % 
ndefine <=> %°<=>"% 

tdefine |< % “\o'<\(or™ % 

ndefine |< % <—/% 

define [> % X\o'>\(or" % 

ndefine [> Hh=> % 

tdefine ang % “\v'~.15m'\z\s—2\(si\s + 2\v'.15m\(ru” % 

ndefine ang % /—_ % 

tdefine rang % “\z\(or\h’.15m'\(ru” % 

ndefine rang % L % 

tdefine 3dot % “\v'—.8m'\z.\v'. Sm \z.\v'.5m' \v'—.2m"" % 

ndefine 3dot % .—~\u.—\u.\d\d % 

tdefine thf % *\v'=~.5m’\v’ Sm.” % 

ndefine thf % ..—-\u.\d % 

.EN 
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The Pws/UNIX* document entitled: 
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ABSTRACT 


BC is a language and a compiler for doing arbitrary precision arithmetic 
on the PDP-11 under the UNIX time-sharing system. The output of the com- 
piler is interpreted and executed by a collection of routines which can input, 
Output, and do arithmetic on indefinitely large integers and on scaled fixed- 
point numbers. 


These routines are themselves based on a dynamic storage allocator. 
Overflow does not occur until all available core storage is exhausted. 


The language has a complete control structure as well as immediate-mode 
operation. Functions can be defined and saved for later execution. 


Two five hundred-digit numbers can be multiplied to give a thousand di- 
git result in about ten seconds. 


A smail collection of library functions is also available, including sin, cos, 
arctan, log, exponential, and Besse} functions of integer order. 


Some of the uses of this compiler are 
- todo computation with large integers, 
— todo computation accurate to many decimal places, 
— conversion of numbers from one base to another base. 


BC — An Arbitrary Precision Desk-Calculator Language 


Lorinda Cherry 


Robert Morris 


Beil Laboratories, 
Murray Hill, New Jersey 07974 


Introduction 


BC is a language and a compiler for doing arbitrary precision arithmetic on the UNIX 
time-sharing system [1]. The compiler was written to make conveniently available a collection 
of routines (called DC [6]) which are capabie of doing arithmetic on integers of arbitrary size. 
The compiler is by no means intended to provide a complete programming language. It is a 
minimal language facility. 

There is a scaling provision that permits the use of decimal point notation. Provision is 
made for input and output in bases other than decimal. Numbers can be converted from de- 
cimal to octal by simply setting the output base to equal 8. 


The actual limit on the number of digits that can be handled depends on the amount of 
Storage available on the machine. Manipulation of numbers with many hundreds of digits is 
possible even on the smallest versions of UNIX. 


The syntax of BC has been deliberately selected to agree substantially with the C 
language [2,3]. Those who are familiar with C will find few surprises in this language. 


Simple Computations with Integers 
The simplest kind of statement is an arithmetic expression on a line by itself. For in- 
Stance, if you type in the line: 
142857 + 285714 
the program responds immediately with the line 
428571 


The operators —, *, /, %, and * can also be used; they indicate subtraction, multiplication, divi- 
sion, remaindering, and exponentiation, respectively. Division of integers produces an integer 
result truncated toward zero. Division by zero produces an error comment. 


Any term in an expression may be prefixed by a minus sign to indicate that it is to be 
negated (the ‘unary’ minus sign). The expression 


I+—3 


iS interpreted to mean that —3 is to be added to 7. 


More complex expressions with several operators and with parentheses are interpreted 
just as in Fortran, with — having the greatest binding power, then * and % and /, and finally + 
and ~. Contents of parentheses are evaluated before material outside the parentheses. Ex- 
ponentiations are performed from right to left and the other operators from left to right. The 
two expressions 


abc and a (b’c) 

are equivalent, as are the two expressions 
a°b°c and (a°b)*c 

BC shares with Fortran and C the undesirable convention that 
a/b*c is equivalent to (a/b)*c 


Internal storage registers to hold numbers have single lower-case letter names. The value 
of an expression can be assigned to a register in the usual way. The statement 


xmx+3 


has the effect of increasing by three the value of the contents of the register named x. When, 
as in this case, the outermost operator is an =, the assignment is performed but the result is 
not printed. Only 26 of these named storage registers are available. 


There is a built-in square root function whose result is truncated to an integer (but see 
scaling below). The lines 


x = sqrt(191) 
x 


produce the printed result 
13 


Bases 
There are special internal quantities, called ‘ibase’ and ‘obase’, The contents of ‘ibase’, 

initially set to 10, determines the base used for interpreting numbers read in. For example, the 
lines 

ibase = 8 

11 
will produce the output line 

9 


and you are ail set up to do octal to decimal conversions. Beware, however of trying to change 
the input base back to decimal by typing 


ibase = 10 


Because the number 10 is interpreted as octal, this statement will have no effect. For those 
who deal in hexadecimal notation, the characters A-F are permitted in numbers (no matter 
what base is in effect) and are interpreted as digits having values 10—15 respectively. The 
Statement 


ibase = A 


will change you back to decimal input base no matter what the current input base is. Negative 
and large positive input bases are permitted but useless. No mechanism has been provided for 
the input of arbitrary numbers in bases less than | and greater than 16. 


The contents of ‘obase’, initially set to 10, are used as the base for output numbers. The 
lines 


obase = [6 
1000 


will produce the output line 
3E8 


which is to be interpreted as a 3-digit hexadecimal number. Very large output bases are per- 
mitted, and they are sometimes useful. For example, large numbers can be output in groups of 
five digits by setting ‘obase’ to 100000. Strange (i.e. 1, 0, or negative) output bases are handled 
appropriately. 

Very large numbers are split across lines with 70 characters per line. Lines which are 
continued end with \. Decimal output conversion is practically instantaneous, but output of 
very large numbers (i.e., more than 100 digits) with other bases is rather slow. Non-decimal 
outpul conversion of a one hundred digit number takes about three seconds. 


It is best to remember that ‘ibase’ and ‘obase’ have no effect whatever on the course of 
internal computation or on the evaluation of expressions, but only affect input and output 
conversion, respectively. 


Sealing 


A third special internal quantity called ‘scale’ is used to determine the scale of calculated 
quantities. Numbers may have up to 99 decimal digits after the decimal point. This fractional 
part is retained in further computations. We refer to the number of digits after the decimal 
point of a number as its scale. 


When two scaled numbers are combined by means of one of the arithmetic operations, 
the result has a scale determined by the following rules. For addition and subtraction, the 
scale of the result is the larger of the scales of the two operands. In this case, there is never 
any truncation of the result. For multiplications, the scale of the result is never less than the 
maximum of the two scales of the operands, never more than the sum: of the scales of the 
operands and, subject to those two restrictions, the scale of the result is set equal to the con- 
tents of the internal quantity ‘scale’. The scale of a quotient is the contents of the internal 
quantity ‘scale’. The scale of a remainder is the sum of the scales of the quotient and the divi- 
sor. The result of an exponentiation is scaled as if the implied multiplications were performed. 
An exponent must be an integer. The scale of a square root is set to the maximum of the scale 
of the argument and the contents of ‘scale’. 


All of the internal operations are actually carried out in terms of integers, with digits be- 
ing discarded when necessary. In every case where digits are discarded, truncation and not 
rounding is performed. 


The contents of ‘scale’ must be no greater than 99 and no less than 0. It is initially set to 
Q. In case you need more than 99 fraction digits, you may arrange your own scaling. 


The internal quantities ‘scale’, ‘ibase’, and ‘obase’ can bt. used in expressions just like 
other variables. The line 
scale = scale + | 
increases the value of ‘scale’ by one, and the line 
scale 


causes the current value of ‘scale’ to be printed. 


The value of ‘scale’ retains its meaning as a number of decimal digits to be retained in 
internal computation even when ‘ibase’ or ‘obase’ are not equal to 10. The internal computa- 
tions (which are still conducted in decimal, regardless of the bases) are performed to the 
specified number of decimal digits, never hexadecimal or octal or any other kind of digits. 


ada 


Functions 


The name of a function is a single lower-case letter. Function names are permitted to 
collide with simple variable names. Twenty-six different derined functions are permitted in ad- 
dition to the twenty-six variable names. The line 


define a(x)| 


begins the definition of a function with one argument. This line must be followed by one or 
more statements, which make up the body of the function, ending with a right brace |. Return 
of control from a function occurs when a return statement is executed or when the end of the 
function is reached. The return statement can take either of the two forms 


return 
return (x) 


In the first case, the value of the function is 0, and in the second, the value of the expression 
in. parentheses. 
Variables used in the function can be declared as automatic by a statement of the form 


auto X,Y,2Z 


There can be only one ‘auto’ statement in a function and it must be the first statement in the 
definition. These automatic variables are allocated space and initialized to zero on entry to the 
function and thrown away on return. The values of any variables with the same names outside 
the function are not disturbed. Functions may be called recursively and the automatic vari- 
ables at each level of call are protected. The parameters named in a function definition are 
treated in the same way as the automatic variables of that function with the single exception 
that they are given a value on entry to the function. An example of a function definition is 


define a(x,y){ 
auto 2 
z= x*y 
return (z) 


The value of this function, when called, will be the product of its two arguments. 


A function is called by the appearance of its name followed by a string of arguments en- 
closed in parentheses and separated by commas. The result is unpredictable if the wrong 
number of arguments is used. 


Functions with no arguments are defined and called using parentheses with nothing 
between them: b(). ' 


If the function a above has been defined, then the line 
a(7,3.14) 
would cause the result 21.98 to be printed and the line 
x = a(a(3,4),5) 
would cause the value of x to become 60. 


Subscripted Variables 


A single lower-case letter variable name followed by an expression in brackets is called a 
subscripted variable (an array element). The variable name is called the array name and the 
expression in brackets is called the subscript. Only one-dimensional arrays are permitted. The 
names of arrays are permitted to collide with the names of simple variables and function 
names. Any fractional part of a subscript is discarded before use. Subscripts must be greater 
than or equal to zero and less than or equal to 2047. 


Subscripted variables may be freely used in expressions, in function calls, and in return 
Statements. 


An array name may be used as an argument to a function, or may be declared as au- 
tomatic in a function definition by the use of empty brackets: 


f(a{}) 
define f(al]) 
auto al] 


When an array name is so used, the whole contents of the array are copied for the use of the 
function, and thrown away on exit from the function. Array names which refer to whole ar- 
rays cannot be used in any other contexts. 


Control Statements 

The ‘if, the ‘while’, and the ‘for’ statements may be used to alter the flow within pro- 
grams or to cause iteration. The range of each of them is a statement or a compound state- 
ment consisting of a collection of statements enclosed in braces. They are written in the fol- 
lowing way 


if(relation) statement 
while(relation) statement 
for(expression!; relation; expression2) statement 


or 


if(relation) {statements} 
while(relation) {statements} 
for(expression|; relation; expression2) {statements} 


A relation in one of the controjl statements is an expression of the form 
x>y 


where two expressions are related by one of the six relational operators <, >, <=, >=, ==, 
or !=. The relation == stands for ‘equal to’ and != stands for ‘not equal to’. The meaning of 
the remaining relational operators is clear. 


BEWARE of using = instead of == in a relational. Unfot.unately, both of them are le- 
gal, so you will not get a diagnostic message, but = really will not do a comparison. 


The ‘if statement causes execution of its range if and only if the relation is true. Then 
control passes to the next statement in sequence. 


The ‘while’ statement causes execution of its range repeatedly as long as the relation is 
true. The relation is tested before each execution of its range and if the relation is false, con- 
trol passes to the next statement beyond the range of the while. 


The ‘for’ statement begins by executing ‘expression!’. Then the relation is tested and, if 
true, the statements in the range of the ‘for’ are executed. Then ‘expression2’ is executed. 
The relation is tested, and so on. The typical use of the ‘for’ statement is for a controlled itera- 
tion, as in the statement 
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for(iml: j<—10: jmj+1) j 


which will print the integers from | to !0. Here are some examples of the use of the control 
statements. 


define f(n){ 

auto i, x 

x=] ; 
for(iml; i< =n; j=i-te]) x=x*} 
return(x) 


The line 
~ f(a) 


will print a factorial if @ is a positive integer. Here is the definition of a function which will 
compute values of the binomial coefficient (m and n are assumed to be positive integers). 


define b(n,m){ 

auto x, j 

x=} 

forG=1; j<=m; jaj+1) xex*(n—j+1)/j 
return(x) 


The following function computes values of the exponential function by summing the appropri- 
ate series without regard for possible truncation errors: 


scale = 20 
define e(x){ 
auto a, 6, c,d, n 
am | 
b= 
cm] 
d= 
n=l 
while(] ===1){ 
am ax 
b = b*n 
cmc+a/b 
nan+ | 
if(c==d) return (c) 
dmc 


Some Details 


There are some language features that every user should know about even if he will not 
use them. 


Normally statements are typed one to a line. It is also permissible to type several state- 
ments on a line separated by semicolons. 


If an assignment statement is parenthesized, it then has a value and it can be used any- 
where that an expression can. For example, the line 


(x=ey+] 7) 


not only makes the indicated assignment, but also prints the resulting value. 
Here is an example of a use of the value of an assignment statement even when it is not 
parenthesized. 
x = ali=i+!] 
causes a value to be assigned to x and also increments i before it is used as a subscript. 


The following constructs work in BC in exactly the same manner as they do in the C 
language. Consult the appendix or the C manuals (2,3] for their exact workings. 


x=ymz is the same as x=n(y mz) 
X=m+ty- X me X+y 

X == y Xm X—y 

xX m* y X = x*y 

x w/y x = x/y 

x 2% y x = x*y 
xm y X =™X'y 
Xb (x=ex+1)—1 
X—— (x=ex—1)+1 
++Xx xX = x+] 
——X X == x—| 


Even if you don't intend to use the constructs, if you type one inadvertently, something 
correct but unexpected may happen. 


WARNING! In some of these constructions, spaces are significant. There is a real 
difference between x=—y and x= —y. The first replaces x by x—y and the second by —y. 


Three Important Things 
1. To exit a BC program, type ‘quit’. 
2. There is a comment convention identical to that of C and of PL/I. Comments begin 
with ‘/* and end with ‘*/’. 
3. There is a library of math functions which may be obtained by typing at command lev- 
el 
be —i 


This command will load a set of library functions which, at the time of writing, consists of sine 
(named ‘s’), cosine (‘c’), arctangent (‘a’), natural logarithm (‘I’), exponential (‘e’) and Bessel 
functions of integer order (‘j(n,x)’). Doubtless more functions will be added in time. The li- 
brary sets the scale to 20. You can reset it to something else if you like. The design of these 
mathematical library routines is discussed elsewhere [4]. 


If you type 4 
be file ... 


BC will read and execute the named file or files before accepting commands from the key- 
board. In this way, you may load your favorite programs and function definitions. 


. 
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Appendix 


1. Notation 


In the following pages syntactic categories are in italics; literals are in bold; material in 
brackets [] is optional. 


2. Tokens 

Tokens consist of keywords, identifiers, constants, operators, and separators. Token 
separators may be blanks, tabs or comments. Newline characters or semicolons separate state- 
ments. 


2.1. Comments 
Comments are introduced by the characters /* and terminated by °/. 


2.2. Identifiers 


There are three kinds of identifiers — ordinary identifiers, array identifiers and function 
identifiers. Al! three types consist of single lower-case letters. Array identifiers are followed by 
square brackets, possibly enclosing an expression describing a subscript. Arrays are singly 
dimensioned and may contain up to 2048 elements. Indexing begins at zero so an array may 
be indexed from 0 to 2047. Subscripts are truncated to integers. Function identifiers are fol- 
lowed by parentheses, possibly enclosing arguments. The three types of identifiers do not 
conflict; a program can have a variable named x, an array named x and a function named x, all 
of which are separate and distinct. 


2.3. Keywords 
The following are reserved keywords: 


ibase if 
obase break 
scale define 
sart auto 
length return 
while quit 
for 


2.4. Constants 


Constants consist of arbitrarily long numbers with an optional decimal point. The hexa- 
decimal digits A~F are also recognized as digits with values 10—15, respectively. 


3. Expressions 

The value of an expression is printed unless the main operator is an assignment. Pre- 
cedence is the same as the order of presentation here, with highest appearing first. Left or 
right associativity, where applicable, is discussed with each operator. 
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3.1. Primitive expressions 


3.1.1. Named expressions 


Named expressions are places where values are stored. Simply stated, named expressions 
are legal on the left side of an assignment. The value of a named expression is the value 
stored in the place named. 


3.1.1.1. identifiers 
Simple identifiers are named expressions. They have an initial value of zero. 


3.1.1.2. array-name| expression | 
Array elements are named expressions. They have an initial value of zero. 


3.1.1.3. scale, ibase and obase 


The internal registers scale, ibase and obase are all named expressions. scale is the 
number of digits after the decimal point to be retained in arithmetic operations. scale has an 
initial value of zero. ibase and obase are the input and output number radix respectively. Both 
ibase and obase have initial values of 10. 


3.1.2. Function calls 


3.1.2.1. funcrion-name(|expression{, expression...}}) 


A function call consists of a function name followed by parentheses containing a 
comma-separated list of expressions, which are the function arguments. A whole array passed 
as an argument is specified by the array name followed by empty square brackets. All function 
arguments are passed by value. As a result, changes made to the formal parameters have no 
effect on the actual arguments. If the function terminates by executing a return statement, the 
value of the function is the value of the expression in the parentheses of the return statement 
or is zero if no expression is provided or if there is no return statement. 


3.1.2.2. sart( expression) 


The result is the square root of the expression. The resuit is truncated in the least 
significant decimal piace. The scale of the result is the scale of the expression or the value of 
scale, whichever is larger. 


3.1.2.3. length ( expression) 


The result is the total number of significant decimal digits in the expression. The scale of 
the result is zero. 


3.1.2.4, scale( expression) 
The result is the scale of the expression. The scale of the result is zero. 


3.1.3. Constants 
Constants are primitive expressions. 
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3.1.4. Parentheses 


An expression surrounded by parentheses is a primitive exp.ession. The parentheses are 


used to alter the normal precedence. 


3.2. Unary operators 
The unary operators bind right to left. 


3.20! — expression 
The result is the negative of the expression. 


3.2.2. +b named-expression 


The named expression is incremented by one. 
pression after incrementing. 


3.2.3. “= named-expression 


The named expression is decremented by one. 


pression after decrementing. 


3.2.4. named-expression ++ 


The named expression is incremented by one. 
pression before incrementing. 


3.2.5. named-expression—— 


The named expression is decremented by one. 
pression before decrementing. 


- 3.3. Exponentiation operator 
The exponentiation operator binds right to left. 


3.3.1. expression ~ expression 


The result is the value of the named ex- 


The result is the value of the named ex- 


The result is the value of the named ex- 


The result is the value of the named ex- 


The result is the first expression raised to the power of the second expression. The 
second expression must be an integer. If a is the scale of the left expression and + is the abso- 
lute value of the right expression, then the scale of the result is: 


min ( ax 4, max (scale, a) ) 


3.4. Multiplicative operators 
The operators *, /, % bind left to right. 
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3.4.1. expression * expression 


The result is the product of the two expressions. If a@ and 6 are the scales of the two ex- 


pressions, then the scale of the result is: 
min ( a+, max.( scale, a,b) ) 


ee 


3.4.2. expression / expression 


The result is the quotient of the two expressions. Th. scale of the result is the value of 
scale. 


3.4.3. expression fe expression 


The % operator produces the remainder of the division of the two expressions. More pre- 
cisely, ad is a—a/b*d. 


The scale of the result is the sum of the scale of the divisor and the value of scale 


3.5. Additive operators 
The additive operators bind-left to right. 


3.5.1. expression + expression 


The result is the sum of the two expressions. The scale of the result is the maximun of 
the scales of the expressions. 


3.5.2. expression = expression 


The result is the difference of the two expressions. The scale of the result is the max- 
imum of the scales of the expressions. 


3.6. assignment operators 
The assignment operators bind right to left. 


3.6.1. named-expression ™ expression 


This expression results in assigning the value of the expression on the right to the named 
expression on the left. 


3.6.2. mamed-expression = expression 
3.6.3. named-expression == expression 
3.6.4. named-expression ™* expression 
3.6.5. named-expression ™/ expression 
3.6.6. aamed-expression =. expression 


3.6.7, named-expression ™° expression 


The result of the above expressions is equivalent to “named expression ™ named expres- 
sion OP expression”, where OP is the operator after the = sign. 


4. Relations 


Unlike all other operators, the relational operators are only valid as the object of an if, 
while, or inside a for statement. 
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4.1. expression < expression 
4.2. expression > expression 
4.3. expression <=*® expression 
4.4. expression > expression 
4.5. expression == seas 
4.6. expression !=* expression 


5. Storage classes 


There are only two storage classes in BC, global and automatic (local). Only identifiers 
that are to be local to a function need be declared with the auto command. The arguments to 
a function are local to the function. All other identifiers are assumed to be global and available 
to all functions. All identifiers, global and local, have initial values of zero. Identifiers declared 
as auto are allocated on entry to the function and released on returning from the function. 
They therefore do not retain values between function calls. auto arrays are specified by the ar- 
ray narne followed by empty square brackets. 


Automatic variables in BC do not work in exactly the same way as in either C or PL/I. 
On entry to a function, the old values of the names that appear as parameters and as automatic 
variables are pushed onto a stack. Until return is made from the function, reference to these 
names refers only to the new values. 


6. Statements 


Statements must be separated by semicolon or newline. Except where altered by control 
statements, execution is sequential. 


6.1. Expression statements 


When a statement is an expression, unless the main operator is an assignment, the value 
of the expression is printed, followed by a newline character. 


6.2. Compound statements 


Statements may be grouped together and used when one statement is expected by sur- 
rounding them with { }. 


6.3. Quoted string statements 
"any string” 
This statement prints the string inside the quotes. 


6.4. If statements 


if ( relation) statement : . 
The substatement is executed if the relation is true. 
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6.5. While statements 


while( re/ation) statement 


The statement is executed while the relation is true The test occurs before each sxecu- 
tion of the statement. 


6.6. For statements 


for( expressiom, relatiom, expression) statement 


The for statement is the same as 
Airst-expression 
while(re/ation) | 

statement 

last-expression 


All three expressions must be present. 


6.7. Break statements 


break 
break causes termination of a for or while statement. 


6.8. Auto statements 


auto identifier ,identifier] 


The auto statement causes the values of the identifiers to be pushed down. The 
identifiers can be ordinary identifiers or array identifiers. Array identifiers are specified by fol- 
lowing the array name by empty square brackets. The auto statement must be the first state- 
ment in a function definition. 


6.9. Define statements 
define( (parameter(, parameter...]])| 
staternents | 


The define statement defines a function. The parameters may be ordinary identifiers or 
array names. Array names must be followed by empty square brackets. 


6.10. Return statements 
return 


return( expression) 

The return statement causes termination of a function, popping of its auto variables, and 
specifies the result of the function. The first form is equivalent to return(0). The result of the 
function is the result of the expression in parentheses. | 


6.11. Quit 


The quit statement stops execution of a BC program and returns control to UNIX when 
it is first encountered. Because it is not treated as an executable staternent, it cannot be used 
in a function definition or in an If, for, or while statement. 


DC — An Interactive Desk Calculator 


Robert Morris 


Lorinda Cherry 


Bell Laboratories, 
Murray Hill, New Jersey 07974 


DC is an arbitrary precision arithmetic package implemented on the UNIX time-sharing 
system in the form of an interactive desk calculator. It works like a stacking calculator using 
reverse Polish notation. Ordinarily DC operates on decimal integers, but one may specify an 
- input base, output base, and a number of fractional digits to be maintained. 


A language called BC [1] has been developed which accepts programs written in the fami- 
liar style of higher-level programming languages and compiles output which is interpreted by 
DC. Some of the commands described below were designed for the compiler interface and are 
not easy for a human user to manipulate. 


Numbers that are typed into DC are put on a push-down stack. DC commands work by 
taking the top number or two off the stack, performing the desired operation, and pushing the 
result on the stack. If an argument is given, input is taken from that file until its end, then 
from the standard input. 


SYNOPTIC DESCRIPTION 

Here we describe the DC commands that are intended for use by people. The additional 
commands that are intended to be invoked by compiled output are described in the detailed 
description. 

Any number of commands are permitted on a line. Blanks and new-line characters are 
ignored except within numbers and in places where a register name is expected. 


The following constructions are recognized: 


number 


The value of the number is pushed onto the main stack. A number is an unbroken 
string of the digits 0-9 and the capital letters A-—F which are treated as digits with values 
10—15 respectively. The number may be preceded by an underscore to input a negative 
number. Numbers may contain decimal points. 


ap a “% 2 ~ 
The top two values on the stack are added (+), subtracted (—), multiplied (*), divided 
(/), remaindered (%), or exponentiated (“). The two entries are popped off the stack; the 
result is pushed on the stack in their place. The result of a division is an integer truncat- 
ed toward zero. See the detailed description below for the treatment of numbers with de- 
cimal points. An exponent must not have any digits after the decimal point. 


A.2 


Sx 
The top of the main stack is popped and stored into a register named x, where x may be 
any character. If the s is capitalized, x is treated as a stack and the value is pushed onto 
it. Any character, even blank or new-line, is a va..u register name. 

ix 


The value in register x is pushed onto the stack. The register x is not altered. If the | is 
Capitalized, register x is treated as a stack and its top value is popped onto the main stack. 


All registers start with empty value which is treated as a zero by the command | and is treated 
as an error by the command L. 


d 
The top value on the stack is duplicated. 
p 
The top value on the stack is printed. The top value remains unchanged. 
f 
. All values on the stack and in registers are printed. 
x 
treats the top element of the stack as a character string, removes it from the stack, and 
executes it as a string of DC commands. 
[ eee | 
puts the bracketed character string onto the top of the stack. 
q 


exits the program. If executing a string, the recursion level is popped by two. If q is cap- . 
italized, the top value on the stack is popped and the string execution level! is popped by 
that value. 


<x >x mx bx !>x lm 


The top two elements of the stack are popped and compared. Register x is executed if 
they obey the stated relation. Exclamation point is negation. 


replaces the top element on the stack by its square root. The square root of an integer is 
truncated to an integer. For the treatment of numbers with decimal points, see the de- 
tailed description below. 


interprets the rest of the fine as a UNIX command. Control returns to DC when the 
UNIX command terminates. 


Cc 
All values on the stack are popped: the stack becomes empty. 
i 
The top value on the stack is popped and used as the number radix for further input. [f i 
is capitalized, the value of the input base is pushed onto the stack. No mechanism has 
been provided for the input of arbitrary numbers in bases less than | or greater than 16. 
0 : 
The top value on the stack is popped and used as the number radix for further output. If 
0 is Capitalized, the value of the output base is pushed onto the stack. 
k 
The top of the stack is popped, and that value is used as a scaie factor that influences the 
number of decimal places that are maintained during multiplication, division, and ex- 
ponentiation. The scale factor must be greater than or equal to zero and less than 100. If 
k is capitalized, the value of the scale factor is pushed onto the stack. 
z 
The value of the stack level is pushed onto the stack. 
4 
A line of input is taken from the input source (usually the console) and executed. 
DETAILED DESCRIPTION 


Internal Representation of Numbers 


Numbers are stored internally using a dynamic storage allocator. Numbers are kept in 
the form of a string of digits to the base 100 stored one digit per byte (centennial digits). The 
String is stored with the low-order digit at the beginning of the string. For example, the 
representation of 157 is 57,1. After any arithmetic operation on a number, care is taken that 
all digits are in the range 0—99 and that the number has no leading zeros. The number zero is 
represented by the empty String. 


Negative numbers are represented in the 100°s complement notation, which is analogous 
to two’s complement notation for binary numbers. The high c:der digit of a negative number 
is always —1 and all other digits are in the range 0—99. The digit preceding the high order —| 
digit is never a 99. The representation of —157 is 43,98,—-1. We shall call this the canonical 
form of a number. The advantage of this kind of represegtation of negative numbers is ease of 
addition. When addition is performed digit by digit, the result is formally correct. The result 
need only be. modified, if necessary, to put it into canonical form. 

Because the largest valid digit is 99 and the byte can hold numbers twice that large, addi- 
tion can be carried out and the handling of carries done later when that is convenient, as it 
sometimes is. 


An additional byte is stored with each number beyond the high order digit to indicate the 
number of assumed decimal digits after the decimal point. The representation of .001 5s 1,3 


where the scale has been italicized to emphasize the fact that it is not the high order digit. The 
value of this extra byte is called the scale factor of the number. 


The Allocator 


DC uses a dynamic string storage allocator for all of its internal storage. All reading and 
writing of numbers internally is done through the allocator. Associated with each string in the 
allocator is a four-word header containing pointers to the beginning of the string, the end of 
the string, the next place to write, and the next place to read. Communication between the al- 
locator and DC is done via pointers to these headers. 


The allocator initially has one large string on a list of free strings. All headers except the 
Oné pointing to this string are on a list of free headers. Requests for strings are made by size. 
The size of the string actually supplied is the next higher power of 2. When a request for a 
String is made, the allocator first checks the free list to see if there is a string of the desired 
size. If none is found, the allocator finds the next larger free string and splits it repeatedly un- 
til it has a string of the right size. Left-over strings are put on the free list. If there are no 
larger strings, the allocator tries to coalesce smaller free strings into larger ones. Since all 
strings are the result of splitting large strings, each string has a neighbor that is next to it in 
core and, if free, can be combined with it to make a string twice as long. This is an implemen- 
tation of the ‘buddy system’ of allocation described in [2]. 


Failing to find a string of the proper length after coalescing, the allocator asks the system 
for more space. The amount of space on the system is the only limitation. on the size and 
number of strings in DC. If at any time in the process of trying to allocate a string, the alloca- 
tor runs out of headers, it also asks the system for more space. 


There are routines in the allocator for reading, writing, copying, rewinding, forward- 
Spacing, and backspacing strings. All string manipulation is done using these routines. 


The reading and writing routines increment the read pointer or write pointer so that the 
characters of a string are read or written in succession by a series of read or write calls. The 
write pointer is interpreted as the end of the information-containing portion of a string and a 
call to read beyond that point returns an end-of-string indication. An attempt to write beyond 
the end of a string causes the allocator to allocate a larger space and then copy the old string 
into the larger block. 


Internal Arithmetic 


All arithmetic operations are done on integers. The operands (or operand) needed for 
the operation are popped from the main stack and their scale factors stripped off. Zeros are ad- 
ded or digits removed as necessary to get a properly scaled result from the internal arithmetic 
routine. For example, if the scale of the operands is different and decimal alignment is re- 
quired, as it is for addition, zeros are appended to the operand with the smaller scale. After 
performing the required arithmetic operation, the proper scale factor is appended to the end of 
the number before it is pushed on the stack. 


A register called scale plays a part in the results of most arithmetic operations. scale is 
the bound on the number of decimal places retained in arithmetic computations. seale may be 
set to the number on the top of the stack truncated to an integer with the k command. K may 
be used to push the value of scale on the stack. scale must be greater than or equal to 0 and 
less than 100. The descriptions of the individual arithmetic operations will include the exact 
effect of scale on the computations. 


Addition and Subtraction 


The scales of the two numbers are compared and trailing zeros are supplied to the 
number with the lower scaie to give both numbers the same scale. The number with the 
smaller scale is multiplied by 10 if the difference of the scales is odd. The scale of the result is 
then set to the larger of the scales of the two operands. 


Subtraction is performed by negating the number to be subtracted and proceeding as in 
addition. 


Finally, the addition is performed digit by digit from the low order end of the number. 
The carries are propagated in the usual way. The resulting number is brought into canonical 
form, which may require stripping of leading zeros, or for negative numbers replacing the 
high-order configuration 99,—-1 by the digit —-1. In any case, digits which are not in the range 
0—99 must be brought into that range, propagating any carries or borrows that result. 


Multiplication 


The scales are removed from the two operands and saved. The operands are both made 
positive. Then multiplication is performed in a digit by digit manner that exactly mimics the 
hand method of multiplying. The first number is multiplied by each digit of the second 
number, beginning with its low order digit. The intermediate products are accumulated into a 
Partial sum: which becomes the final product. The product is put into the canonical form and 
its sign is computed from the signs of the original operands. : 


The scale of the result is set equal to the sum of the scales of the two operands. If that 
scale is larger than the internal register scale and also larger than both of the scales of the two 
operands, then the scale of the result is set equal to the largest of these three last quantities. 


Division 

The scales are removed from the two operands. Zeros are appended or digits removed 
from the dividend to make the scale of the result of the integer division equal to the internal 
Quantity seale. The signs are removed and saved. 


Division is performed much as it would be done by hand. The difference of the lengths 
of the two numbers is computed. If the divisor is longer than the dividend, zero is returned. 
Otherwise the top digit of the divisor is divided into the top two digits of the dividend. The 
result is used as the first (high-order) digit of the quotient. It may turn out be one unit too 
low, but if it is, the next trial quotient will be larger than 99 and this will be adjusted at the 
end of the process. The trial digit is multiplied by the divisor and the result subtracted from 
the dividend and the process is repeated to get additional quotient digits until the remaining 
dividend is smaller than the divisor. At the end, the digits of the quotient are put into the 
canonical form. with propagation of carry as needed. The sign is set from the sign of the 
Operands. 


Remainder 


The division routine is called and division is performed exactly as described. The quanti- 
ty returned is the remains of the dividend at the end of the divide process. Since division 
truncates toward zero, remainders have the same sign as the dividend. The scale of the 
remainder is set to the maximum of the scale of the dividend and the scale of the quotient pius 
the scale of the divisor. 
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Square Root 


The scale is stripped from the operand. Zeros are added if necessary to make the integer 
result have a scale that is the larger of the internal quantity seale and the scale of the operand. 


The method used to compute sart(y) is Newton's method with successive approximations 
by the rule ; 


Xne I = Yj (x, +2) 
, ; Xp 
The initial guess is found by taking the integer square root of the top two digits. 


Exponentiation 


Only exponents with zero scale factor are handled. If the exponent is zero, then the 
result is 1. If the exponent is negative, then it is made positive and the base is divided into 
one. The scale of the base is removed. 


The integer exponent is viewed as a binary number. The base is repeatedly squared and 
the result is obtained as a product of those powers of the base that correspond to the positions 
of the one-bits in the binary representation of the exponent. Enough digits of the result re- . 
moved to make the scale of the result the same as if the indicated multiplication had been per- 
formed. 


input Conversion and Base 


Numbers are converted to the internal representation as they are read in. The scale 
stored with a number is simply the number of fractional digits input. Negative numbers are 
indicated by preceding the number with a _. The hexadecimal digits A~F correspond to the 
numbers 10-15 regardless of input base. The i command can be used to change the base of 
the input numbers. This command pops the stack, truncates the resulting number to an in- 
teger, and uses it as the input base for all further input. The input base is initialized to 10 but 
may, for example be changed to 8 or 16 to do octal or hexadecimal to decimal conversions. 
The command | will push the value of the input base on the stack. 


Output Commands 


The command p causes the top of the stack to be printed. It does not remove the top of 
the stack. All of the stack and internal registers can be output by typing the command f. The 
0 command can be used to change the output base. This command uses the top of the stack, 
truncated to an integer as the base for all further output. The output base in initialized to 10. 
It will work correctly for any base. The command O pushes the value of the output base on 
the stack. 


Output Format and Base 


The input and output bases only affect the interpretation of numbers on input and out- 
put; they have no effect on arithmetic computations. Large numbers are output with 70 char- 
acters per line; a \ indicates a continued line. All choices of input and output bases work 
correctly, although not al] are useful. A particularly useful output base is 100000, which has 
the effect of grouping digits in fives. Bases of 8 and 16 can be used for decimal-octal or 
decimal-hexadecimal! conversions. . 


internal Registers 


Numbers or strings may be stored in internal registers or loaded on the stack from regis- 
ters with the commands s and |. The command sx pops the top of the stack and stores the 
result in register x. x can be any character. Lx puts the contents of register x on the top of the 
stack. The | command has no effect on the contents of register x. The s command, however, 
is destructive. 


Stack Commands 


The command c clears the stack. The command d pushes a duplicate of the number on 
the top of the stack on the stack. The command z pushes the stack size on the stack. The 
command X replaces the number on the top of the stack with its scale factor. The command Z 
replaces the top of the stack with its length. 


Subroutine Definitions and Calls 


Enclosing a string in [] pushes the ascii string on the stack. The q command quits or in 
executing a String, pops the recursion levels by two. 


Internal Registers — Programming DC 


The load and store commands together with [] to store strings, x to execute and the test- 
ing commands *<’, ‘>’, ‘=’, ‘!<', ‘!>’, ‘f=’ can be used to program DC. The x command as- 
sumes the top of the stack is an string of DC commands and executes it. The testing com- 
mands compare the top two elements on the stack and if the relation holds, execute the regis- 
ter that follows the relation. For example, to print the numbers 0-9, 


[lipl+ si 1110><alsa 
Osi fax 


Push-Down Registers and Arrays 


These commands were designed for used by a compiler, not by people. They involve 
push-down registers and arrays. In addition to the stack that commands work on, DC can be 
_ thought of as having individual stacks for each register. These registers are operated on by the 
commands S and L. Sx pushes the top value of the main stack onto the stack for the register 
x. Lx pops the stack for register x and puts the result on the main stack. The commands s and 
| also work on registers but not as push-down stacks. { doesn’t effect the top of the register 
stack, and s destroys what was there before. 


The commands to work on arrays are : and ;. :x pops the stack and uses this value as an 
index into the array x. The next element on the stack is stored at this index in x An index 
must be greater than or equal to 0 and less than 2048. :x is the command to load the main 
stack from the array x. The value on the top of the stack is the index into the array x of the 
value to be loaded. 


Miscellaneous Commands 

The command ! interprets the rest of the line as a UNIX command and passes it to 
UNIX to execute. One other compiler command is Q. This command uses the top of the 
stack as the number of levels of recursion to skip. 


DESIGN CHOICES 


The real reason for the use of a dynamic storage allocator was that a general purpose pro- 
gram could be (and in fact has been) used for a variccy of other tasks. The allocator has some 
value for input and for compiling (i.e. the bracket [...]) commands) where it cannot be known 
in advance how long a string will be. The result was that at a modest cost in execution time, 
all considerations of string allocation and sizes of strings were removed from the remainder of 
the program and debugging was made easier. The allocation method used wastes approximate- 
ly 25% of available space. 


The choice of 100 as a base for internal arithmetic seemingly has no compelling advan- 
tage. Yet the base cannot exceed 127 because of hardware limitations and at the cost of 5% in 
space, debugging was made a great deal easier and decimal output was made much faster. 


The reason for a stack-type arithmetic design was to permit all DC commands from addi- 
tion to subroutine execution to be implemented m essentially the same way. The result was a 
considerable degree of logical separation of the final program into modules with very little com- 
munication between modules. 


The rationale for the lack of interaction between the scale and the bases was to provide 
an understandabie means of proceeding after a change of base or scale when numbers had al- 
ready been entered. An earlier implementation which had global notions of scale and base did 
not work out well. If the value of scale were to be interpreted in the current input or output 
base, then a change of base or scale in the midst of a computation would cause great confusion 
in the interpretation of the results. The current scheme has the advantage that the value of 
the input and output bases are only used for input and output, respectively, and they are ig- 
nored in all other operations. The value of scale is not used for any essential purpose by any 
part of the program and it is used only to prevent the number of decimal places resulting from 
the arithmetic operations from growing beyond ail bounds. 


The design rationale for the choices for the scales of the results of arithmetic were that in 
no case should any significant digits be thrown away if, on appearances, the user actually want- 
ed them. Thus, if the user wants to add the numbers 1.5 and 3.517, it seemed reasonable to 
give him the result 5.017 without requiring him to unnecessarily specify his rather obvious re- 
quirements for precision. 


On the the other hand, multiplication and exponentiation produce resuits with many 
more digits than their operands and it seerned reasonabie to give as a minimum the number of 
decimal places in the operands but not to give more than that number of digits unless the user 
asked for them by specifying a value for scale. Square root can be handled in just the same 
way aS multiplication. The operation of division gives arbitrarily many decimal places and 
there is simply no way to guess how many places the user wants. In this case only, the user 
must specify a scale to get any decimal places at all. 


The scale of remainder was chosen to make it possible to recreate the dividend from the 
quotient and remainder. This is easy to implement; no digits are thrown away. 
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ABSTRACT 


Computer program input generally has some structure; in fact, every 
computer program which does input can be thought of as defining an “‘input 
language’ which it accepts. The input languages may be as complex as a pro- 
gramming language, or as simple as a sequence of numbers. Unfortunately, 
standard input facilities are restricted, difficult to use and change, and do not 
completely check their inputs for validity. 


Yace provides a general tool for controlling the input to a computer pro- 
gram. The Yacc user describes the structures of his input, together with code 
which is to be invoked when each such structure is recognized. Yacc turns 
such a specification into a subroutine which may be invoked to handle the in- 
put process; frequently, it is convenient and appropriate to have most of the 
flow of control in the user’s application handled by this subroutine. 


The input subroutine produced by Yacc calls a user supplied routine to 
return the next basic input item. Thus, the user can specify his input in terms 
of individual input characters, or, if he wishes, in terms of higher level con- 
Structs such as names and numbers. The user supplied routine may also han- 
dle idiomatic features such as comment and continuation conventions, which 
typically defy easy specification. 


Yacc is written in C{7], and runs under UNIX. The subroutine which is 
Output may be in C or in Ratfor{4], at the user’s choice; Ratfor permits transla- 
tion of the output subroutine into portable Fortran{5]. The class of 
specifications accepted is a very general one, called LALR(1) grammars with 
disambiguating rules. The theory behind Yacc has been described else- 
where(1,2,3]. 


Yacc was originally designed to help produce the “front end” of com- 
pilers; in addition to this use, it has been successfully used in many application 
programs, including a phototypesetter language, a document retrieval system, a 
Fortran debugging system, and the Ratfor compiler. 


YACC — Yet Another Compiler-Compiler 


Stephen C. Johnson 


Bell Laboratories, 
Murray Hill, New Jersey 07974 


Section 0: Introduction 


Yacc provides a generai tool for imposing structure on the input to a computer program. 
The Yacc user prepares a Specification of the input process; this includes rules which describe 
the input Structure, code which is to be invoked when these structures are recognized, and a 
low-level routine to do the basic input. Yacc then produces a subroutine to do the input pro- 
cedure; this subroutine, called a parser, calls the user-supplied low-level input routine (called 
the lexical analyzer) to pick up the basic items (called tokens) from the input stream. These to- 
kens are organized according to the input structure rules, called grammar rules; when one of 
these rules has been recognized, then the user code supplied for this rule, called an action, is 
invoked; actions have the ability to return values and make use of the values of other actions. 


The heart of the input specification is a collection of grammar rules. Each rule describes 
an allowable structure and gives ita name. For example, one grammar rule might be 


date : month_name day ‘, year ; 


Here, date, month_name, day, and year represent structures of interest in the input process; 
presumably, month_name. day, and year are defined elsewhere. The comma “,” is quoted by 
single quotes; this implies that the comma is to appear literally in the input. The colon and 
semicolon merely serve as punctuation in the rule, and have no significance in controlling the 
input. Thus, with proper definitions, the input 


July 4, 1776 


might be matched by the above rule. 


As we mentioned above, an important part of the input process is carried out by the lexi- 
cal analyzer. This user routine reads the true input stream, recognizing those structures which 
are more conveniently or more efficiently recognized directly, and communicates these recog- 
nized tckens to the parser. For historical reasons, the name of a structure recognized by the 
lexical enalvzer is called a terminal symbol name, while the name of a structure recognized by 
the parser is called a nonterminal symbol name. To avoid the obvious confusion of terminology, 
we shall usually refer to terminal symbol names as token names. 


There is considerable leeway in deciding whether to recognize structures by the lexical 
analyzer or by a grammar rule. Thus, in the above example it would be possible to have other 
rules of the form 


month name : ‘J’‘a’‘n’ ; 
month name : “F’‘e‘b’ ; 


month_name : “De ’c’ ; 


Here, the lexical analyzer would only need to recognize individual letters, and month_name 
would be a nonterminal symbol. Rules of this sort tend to be a bit wasteful of time and space, 
and may even restrict the power of the input process (although they are easy to write). Fora 


more efficient input process, the lexical analyzer itself might recognize the month names, and 
return an indication that a month_name was seen; in this case, month_name would be a token. 


Literal characters, such as **,”, must also be passed through the lexical analyzer, and are 
considered tokens. 


As an example of the flexibility of the grammar rule approach, we might add to the above 
specifications the rule 


date : month ‘/ day ‘/ year ; 
and thus optionally allow the form 
7/4/1776 
as a synonym for 
July 4, 1776 


In most cases, this new rule could be “slipped in’ to a working system with minimal effort, 
and a very small chance of disrupting existing input. 


Frequently, the input being read does not conform to the specifications due to errors in 
the input. The parsers produced by Yacc have the very desirable property that they will detect 
these input errors at the earliest place at which this can be done with a left-to-right scan, thus, 
not only ts the chance of reading and computing with bad input data substantially reduced, but 
the bad data can usually be quickly found. Error handling facilities, entered as part of the in- 
put specifications, frequently permit the reentry of bad data, or the continuation of the input 
process after skipping over the bad data. 


In some cases, Yucc fails to produce a parser when given a Set of specifications. For ex- 
ample, the specifications may be self contradictory, or they may require a more powerful recog- 
nition mechanism than that available to Yacc. The former cases probably represent true 
design errors; the latter cases can often be corrected by making the lexical analyzer more 
powerful, or by rewriting some of the grammar rules. The class of specifications which Yacc 
can handle compares very favorably with other systems of this type; moreover, the construc- 
tions which are difficult for Yacc to handle are also frequently difficult for human beings to 
handle. Some users have reported that the discipline of formulating valid Yacc specifications 
for their input revealed errors of conception or design early in the program development. 


The next several sections describe the basic process of preparing a Yacc Specification; 
Section | describes the preparation of grammar rules, Section 2 the preparation of the user sup- 
plied actions associated with these rules, and Section 3 the preparation cf lexical analyzers. In 
Section 4, we discuss the diagnostics produced when Yacc is unable to produce a parser from 
the given specifications. This section also describes a simple, frequently useful mechanism for 
handling operator precedences. Section 5 discusses error detection and recovery. Sections 6C 
and 6R discuss the operating environment and special features of the subroutines which Yacc 
produces in C and Ratfor, respectively. Section 7 gives some hints which may lead to better 
designed, more efficient, and clearer specifications. Finally, Section 8 has a brief summary. 
Appendix A has a brief example, and Appendix B tells how to rum Yacc on the UNIX operat- 
ing system. Appendix C has a brief description of mechanisms and syntax which are no longer 
actively supported, but which are provided for historical continuity with older versions of Yacc. 


Section 1: Basic Specifications 


As we noted above, names refer to either tokens or nonterminal symbols. Yacc requires 
those names which will be used as token names to be declared as such. In addition, for rea- 
sons which will be discussed in Section. 3, it is usually desirable to include the lexical analyzer 
as part of the specification file; it may be useful to include other programs as well. Thus, every 
specification file consists of three sections: the declarations, (grammar) niles, and programs. The 
sections are separated by double percent ‘%%” marks. (The per-cent ‘%” is generally used in 
Yacc specifications as an escape character.) 


In other words, a full specification filé looks like 


declarations 
%% 

rules 

%o%o 
programs 


The declaration section may be empty. Moreover, if the programs section is omitted, the 
second %% mark may be omitted also; thus, the smallest legal Yacc specification is 


%% 
rules 


Blanks, tabs, and newlines are ignored except that they may not appear in names or 
multi-character reserved symbols. Comments may appear wherever a name or operator is le- 
gal; they are enclosed in /*... */, as in C and PL/I. 


The rules section is made up of one or more grammar rules. A grammar rule has the 
form: 


A : BODY ; 


A represents a nonterminal name, and BODY represents a sequence of zero or more names 
and literals. Notice that the colon and the semicolon are Yacc punctuation. 


os 9 


Names may be of arbitrary length, and may be made up of letters, dot “.’, underscore 
‘“ , and non-initial digits. Notice that Yacc considers that upper and lower case letters are dis- 
tinct. The names used in the body of a grammar rule may represent tokens or nonterminal 
symbols. 


A literal consists of a character enclosed in single quotes ‘“’’’. As in C, the backslash ‘‘\” 
iS an escape character within literals, and all the C escapes are recognized. Thus 


\n’ represents newline 

\r’ represents return 

\" represents single quote ‘*’”’ 
\\V represents backslash ‘‘\”” 
\t’ represents tab 

\b’ represents backspace 
\xxx" represents “xxx” in octal 


For a number of technical reasons, the nul character (\0’ or 000) should never be used in 
grammar rules. 

If there are several grammar rules with the same left hand side, the vertical bar ‘Y’’ can 
be used to avoid rewriting the left hand side. In addition, the semicolon at the end of a rule 


can be dropped before a vertical bar. Thus the grammar rules 


CD 
F 


® 


A: 
A: 
A: 


Qm ow 


can be given to Yacc as 


A: BCD| 
EF 
Gs 
t is not necessary that all grammar rules with the same left side appear together in the gram- 
mar rules section, although it makes the input much more readable, and easy to change. 


If a nonterminal symbol matches the empty string, this can be indicated in the obvious 
way: 
empty: ; 


As we mentioned above, names which represent tokens must be declared as such. The 
simplest way of doing this is to write 


¥%token namel name2... 


in the declarations section. (See Sections 3 and 4 for much more discussion). Every name not 
defined in the declarations section is assumed to represent a nonterminal symbol. If, by the 
end of the rules section, some nonterminal symboi has not appeared on the left of any rule, 
then an error message is produced and Yacc halts. 


The left hand side of the first grammar rule in the grammar rules section has special im- 
portance; it is taken to be the controlling nonterminal symbol for the entire input process; in 
technical language it is called the starr symbol. In effect, the parser is designed to recognize the 
start symbol; thus, this symbol generally represents the largest, most general structure 
described by the grammar rules. 


The end of the input is signaled by a special token, called the endmarker. If the tokens 
up to, but not including, the endmarker form a structure which matches the start symbol, the 
parser subroutine returns to its caller when the endmarker is seen; we say that it accepts the 
input. If the endmarker is seen in any other context, it is an error. 


{t is the job of the user supplied lexical analyzer to return the endmarker when appropri- 
ate; see section 3, below. Frequently, the endmarker token represents some reasonably obvi- 
ous 1/O status, such as ‘“‘end-of-file” or ‘“‘end-of-record”’. 


Section 2: Actions 


To each grammar rule, the user may associate an action to be performed each time the 
rule is recognized in the input process. This action may return a value, and may obtain the 
values returned by previous actions in the grammar rule. In addition, the lexical analyzer can 
return values for tokens, if desired. 


When invoking Yacc, the user specifies a programming language: currently, Ratfor and C 
are supported. An action is an arbitrary statement in this language, and as such can do input 
and output, call subprograms, and alter external vectors and variables (recall that a “statement” 
in both C and Ratfor can be compound and do many distinct tasks). An action is specified by 
an equal sign ““="" at the end of a grammar rule, followed by one or more statements, enclosed 
in curly braces “{"’ and ‘“}". For example, 


A: ‘( B’)’ = { hello¢ 1, "abe" ): } 


and 


XXX: YYY ZZZ = 


printi("a message\n"); 
flag = 25; 
} 


are grammar rules with actions in C. A grammar rule with an action need not end with a sem- 
icolon; in fact, it is an error to have a semicolon before the equal sign. 


To facilitate. easy communication between the actions and the parser, the action State- 
ments are altered slightly. The symbol “dollar sign’ ‘“‘S” is used as a signal to Yacc in this 
context. 


To return a value, the action normally sets the pseudo-variable “$$° to some integer 
value. For example, an action which does nothing but return the value | is 


=($$=1;} 


To obtain the values returned by previous actions and the lexical analyzer, the action 
may use the (integer) pseudo-variables $1, $2, ..., which refer to the values returned by the 
components of the right side of a rule, reading from left to right. Thus, if the rule is 


A: BCD; 


for example, then $2 has the-value returned by C, and $3 the value returned by D. 
AS a more concrete example, we might have the rule 
expression: ‘(° expression °)’ ; 
We wish the value returned by this rule to be the value of the expression in parentheses. 
Then we write 


expression: ‘( expression ’)’ = {| $$ = $2; } 


As a default, the value of a rule is the value of the first element in it ($1). This is true 
even if there is no explicit action given for the rule. Thus, grammar rules of the form 


A: B; 
frequently need not have an explict action. 


Notice that, although the values of actions are integers, these integers may in fact contain 
pointers (in C) or indices into an array (in Ratfor); in this way, actions can return and refer- 
ence more complex data structures. 


Sometimes, we wish to get control before a rule is fully parsed, as well as at the end of 
the rule. There is no explicit mechanism in Yacc to allow this; the same effect can be ob- 
tained, however, by introducing a new symbol which matches the empty string, and inserting 
an action for this symbol. For example, we might have a rule describing an “if statement: 


statement: IF ’(° expr ’)’ THEN statement 


Suppose that we wish to get control after secing the right parenthesis in order to output some 
code. We might accomplish this by the rules: 


statement: IF ‘CU expr’) actn THEN statement 
= { call action! | 


actin: /* matches the empty string */ 
= { call action2 } 


Thus, the new nonterminal symbol actn matches no input, but serves only to call action2 
after the right parenthesis ts seen. 


irequently, it is more natural in such cases to | .2ak the rule into parts where the action 
is needed. Thus, the above example might also have been written 


statement: ifpart THEN statement 
= { call action! } 


ifpart: IF °C expr ’)’ 
= { call action2 } 


In many applications, output is not done directly by the actions; rather, a data structure, 
such as a parse tree, is constructed in memory, and transformations are applied to it before out- 
pul is generated. Parse trees are particularly easy to construct, given routines which build and 
maintain the tree structure desired. For example, suppose we have a C function ‘“node’’, writ- 
ten so that the call 


node( L, nl, n2 ) 


creates a node with label L, and descendants nl and n2, and returns a pointer to the newly 
created node. Then we can cause a parse tree to be built by supplying actions such as: 


expr: expr “+ expr 
= { $$ = node( ‘+’, $1, $3); } 


in our specification. 


The user may define other variables to be used by the actions. Declarations and 
definitions can appear in two places in the Yacc specification: in the declarations section, and 
at the head of the rules sections, before the first grammar rule. In each case, the declarations 
and definitions are enclosed in the marks “%({" and “%}”. Declarations and definitions placed 
in the declarations section have global scope, and are thus known to the action statements and 
the lexical analyzer. Declarations and definitions placed at the head of the rules section have 
scope local to the action statements. Thus, in the above example, we might have included 


%{ int variable 0; %} 
in the declarations section, or, perhaps, 
%{ static int variable: %} 


at the head of the rules section. If we were writing Ratfor actions, we might want to include 
some COMMON statements at the beginning of the rules section, to allow for easy communi- 
cation between the actions and other routines. For both C and Ratfor, Yacc has used only 
external names beginning in “yy’’; the user should avoid such names. 


Section 3: Lexical Analysis 


The user must supply a lexical analyzer which reads the input stream and communicates 
tokens (with values, if desired) to the parser. The lexical analyzer is an integer valued function 
called yylex, in both C and Ratfor. The function returns an integer which represents the type 
of the token. The value to be associated in the parser with that token is assigned to the integer 
variable yylvai. Thus, a lexical analyzer written in C should begin 


yylex () { 
extern int yylval; 


while a lexical analyzer written in Ratfor should begin 


integer function yylex(yyival) 
integer yylval 


Clearly, the parser and the lexical analyzer must agree on the type numbers in order for 
communication between them to take place. These numbers may be chosen by Yacc, or 
chosen by the user. In either case, the “define” mechanisms of C and Ratfor are used to allow 
the lexical analyzer to return these numbers symbolically. For example, suppose that the to- 
ken name DIGIT has been defined in. the declarations section of the specification. The 
relevant portion of the lexical analyzer (in C) might look like: 


yylex( ) | , 
extern int yylval; 
int c; 


c = getchar( ); 


if(-c >='0 &&c <='9') [ 
yylval = c—0°: 
return(DIGIT).: 


The relevant portion of the Ratfor lexical analyzer might look like: 


integer function yyltex(yylval) 
integer yylval, digits(10), c 


data digits(1) / "0" /; 
data digits(2) / "I" /; 


data digits(10) / "Q" f: 


# setc to the next input character 


doi =1, 10 { 
if(c .EQ. digits(i)) { 
yylval = i-1 
yylex = DIGIT 
return 


In both cases, the intent is to return a token type of DIGIT, and a value equal to the nu- 
merical value of the digit. Provided that the lexical analyzer code is placed in the programs 
section of the specification, the identifier DIGIT will be redefined to be equal to the type 
number associated with the token name DIGIT. 


This mechanism leads to clear and easily modified lexical analyzers; the only pitfall ts 
that it makes it important to avoid using any names in the grammar which are reserved or 
significant in the chosen language; thus, in both C and Ratfor, the use of token names of “if” 
or “yylex” will almost certainly cause severe difficulties when the lexical analyzer is compiled. 
The token name “error” is reserved for error handling, and should not be used naively (see 
Section 5). 


As mentioned above, the type numbers may b.. chosen by Yace or by the user. In the 
default situation, the numbers are chosen by Yace. The default type number for a literal char- 
acter as the numerical value of the character, considered as a | byte integer. Other token 
names are assigned type numbers starting at 257. ! is a difficult, machine dependent operation 
to determine the numerical value of an input character in Ratfor (or Fortran). Thus, the Rat- 
for user of Yacc will probably wish to set his own type numbers, or not use any literals in his 
specification. 

To assign a type number to a token (including literals), the first appearance of the token 
name or literal in the declarations section can be immediately followed by a nonnegative integer. 
This integer is taken to be the type number of the name or literal. Names and literals not 
defined by this mechanism retain their default definition. I[t is important that all type numbers 
be distinct. 


There is one exception to this situation. For sticky historical reasons, the endmarker 
must have type number 0. Note that this is not unattractive in C, since the nul character is re- 
turned upon end of file; in Ratfor, it makes no sense. This type number cannot be redefined 
’ by the user; thus, all lexical analyzers should be prepared to return 0 as a type number upon 
reaching the end of their input. 


Section 4: Ambiguity, Conflicts, and Precedence 


A set of grammar rules is ambiguous if there is some input string which can be structured 
in two or more different ways. For example, the grammar rule 


expr: expr — expr; 


iS a natural way of expressing the fact that one way of forming an arithmetic expression is to 
put two other expressions together with a minus sign between them. Unfortunately, this 
grammar rule does not completely specify the way that all complex inputs should be structured. 
For example, if we have input of the form 


expr — expr — expr 
the rule would permit us to treat this input either as 
( expr — expr ) — expr 
or aS 
expr — ( expr — expr ) 
(We speak of the first as /eft association of operators, and the second as right assocraron). 


Yacc detects such ambiguities when it is attempting to build the parser. It is instructive 
to consider the problem that confronts the parser when it is given an input such as 


expr — expr — expr 
When the parser has read the second expr, the input which it has seen: 
expr — expr 


matches the right side of the grammar rule above. One valid thing for the parser to do is to 
reduce the input it has seen by applying this rule; after applying the rule, it would have re- 
duced the input it had already seen to expr (the left side of the rule). It could then read the 
final part of the input: 


— expr 


and again reduce by the rule. We see that the effect of this is to take the left associative in- 
terpretation. 


Alternatively, when the purser has seen 
expr — expr 


it could defer the immediate application of the rule, and continue reading (the technical term 
is shifting) the input until it had seen 


expr — expr — expr 


It could then apply the grammar rule to the rightmost three symbols, reducing them to expr 
and leaving 


expr — expr 


Now it can reduce by the rule again; the effect is to take the right associative interpretation. 
Thus, having read 


expr — expr 


the parser can do two legal things, a shift or a reduction, and has no way of deciding between 
them. We refer to this as a shiftireduce conflict. 1t may also happen that the parser has a choice 
of two legal reductions; this is called a reduce/reduce conflict. 


When there are shift/reduce or reduce/reduce conflicts, Yacc still produces a parser. It 
does this by selecting one of the valid steps wherever it has a choice. A rule which describes 
which choice to make in a given situation is called a disambiguating rule. 


Yacc has two disambiguating rules which are invoked by default, in the absence of any 
user directives to the contrary: 


L. In a shift/reduce conflict, the default is to do the shift. 


2. In a reduce/reduce conflict, the default is to reduce by the earlier grammar rule (in the 
input sequence). 


Rule | implies that reductions are deferred whenever there is a choice, in favor of shifts. 
Rule 2 gives the user rather crude control over the behavior of the parser in this situation, but 
the proper use of reduce/reduce conflicts is still a black art, and is properly considered an ad- 
vanced topic. ' 


Conflicts may arise because of mistakes in input or logic, or because the grammar rules, 
while consistent, require a more complex parser than Yacce can construct. In these cases, the 
application of disambiguating rules is inappropriate, and leads to a parser which is in error. For 
this reason, Yacc always reports the number of shift/reduce and reduce/reduce conflicts which 
were resolved by Rule | and Rule 2. 


In general, whenever it is possible to apply disambiguating rules to produce a correct 
parser, it is also possible to rewrite the grammar rules so that the same inputs are read, but 
there are no conflicts. For this reason, most previous systems like Yacc have considered 
conflicts to be fatal errors. Our experience has suggested that this rewriting is somewhat unna- 
tural to do, and produces slower parsers; thus, Yacc will produce parsers even in the presence 
of conflicts. 


As an example of ‘the power of disambiguating rules, consider a fragment from a pro- 
gramming language involving an “if-then-else” construction: 


stat : IF “(’ cond *)’ stat | 
{[F °C cond *)’ stat ELSE stat : 


Here, we consider IF and ELSE to be tokens, cond to be a nonterminal!l symbol describing con- 
ditional (logical) expressions, and stat to be a nonterminal symbol describing statements. In 
the following, we shall refer to these two rules as the simple-if rule and the if-else rule, respec- 
tively. 
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These two rules form an ambiguous construction, since input of the form 
IF (Cl) 1F (C2) Si ELSE 82 
can be structured according to these rules in (wo ways: 


IF (CL) { 
IF (C2) SI 


ELSE $2 
or 


IF (C1) { 
IF (C2) Sl 
ELSE S2 

} 


The second interpretation is the one given in most programming languages which have this 
construct. Each ELSE is associated with the last preceding ““un-ELSE’d” IF. In this example. 
consider the situation where the parser has seen 


[IF (Cl) IF (C2) Sl 
and is looking ait the ELSE. It can immediately reduce by the simple-if rule to get 
IF (Cl ) stat 
and then read the remaining input, 
ELSE S2 
and reduce 
IF (Cl ) stat ELSE S2 


by the if-else rule. This leads to the first of the above groupings of the input. 
On the other hand, we may séift the ELSE and read S2, and then reduce the right hand 

portion of 

IF (Cl) IF (C2) S1 ELSE S2 
by the if-else rule to get 

IF (Cl ) stat 
which can be reduced by the simple-if rule. This leads to the second of the above groupings 
of the input, which is usually desired. 


Once again the parser can do two valid things — we have a shift/reduce conflict. The up- 
plication of disambiguating rule | tells the parser to shift in this case, which leads to the 
desired grouping. 

Notice that this shift/reduce conflict arises only when there is a particular current input 
symbol, ELSE, and particular inputs already seen, such as 


IF (Cl) IF (C2) Sl 


In general, there may be many conflicts, and each one will be associated with an input symbol 
and a set of previously read inputs. The previously read inputs are characterized by the scare of 
the parser, which is assigned a nonnegative integer. The number of states in the parser is typi- 
cally two to five times the number of grammar rules. 
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When Yacc is invoked with the verbose (—v) option (see Appendix B), it produces a file 
of user output which includes a description of the states in the parser. For example, the output 
corresponding to the above exampie might be: 


23: shift/reduce Conflict (Shift 45, Reduce 18) on ELSE 
State 23 


stat : IF ( cond ) stat_ 
stat : IF (cond ) stat_ELSE stat 


ELSE shift 45 
reduce |8 


The first line describes the conflict, giving the state and the input symbol. The state title fol- 
lows, and a brief description of the grammar rules which are active in this state. The underline 
‘“* ’’ describes the portions of the grammar rules which have been seen. Thus in the example, 


in state 23 we have seen input which corresponds to 
IF ( cond ) stat 


and the two grammar rules shown are active at this time. The actions possible are, if the input 
symbol is ELSE, we may shift into state 45. In this state, we should find as part of the descrip- 
tion a line of the form 


stat : IF (cond ) stat ELSE stat 


because in this state we will have read and shifted the ELSE. Back in state 23, the alternative 
action, described by “*.”, is to be done if the input symbol is not mentioned explicitly in the 
above actions; thus, in this case, if the input symbol is not ELSE, we should reduce by gram- 
mar rule 18, which is presumably 


stat: IF ‘C cond ’)’ stat 


Notice that the numbers following ‘“‘shift’” commands refer to other states, while the numbers 
following “reduce” commands refer to grammar rule numbers. In most states, there will be 
only one reduce action possible in the state, and this will always be the default command. The 
user who encounters unexpected shift/reduce conflicts will probably want to look at the ver- 
bose output to decide whether the default actions are appropriate. In really tough cases, the 
user might need to know more about the behavior and construction of the parser than can be 
covered here; in this case, a reference such as [1] might be consulted; the services of a local 
guru might also be appropriate. 

There is one common situation where the rules given above for resolving conflicts are 
not sufficient; this is in the area of arithmetic expressions. Most of the commonly used con- 
Structions for arithmetic expressions can be naturally described by the notion of precedence lev- 
els for operators, together with information about left or right associativity. It turns out that 
ambiguous grammars with appropriate disambiguating rules can be used to create parsers which 
are faster and easier to write than parsers constructed from unambiguous grammars. The basic 
notion is to write grammar rules of the form 


expr : expr OP expr 
and 
expr: UNARY expr 


for all binary and unary operators desired. This creates a very ambiguous grammar, with many 
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parsing cout us. As disambipuating cules, the user specifics the precedence, or binding 
strength, of all the operators, and the associativity of the binary operators. This information ts 
sufficient to allow Yacc to resolve the parsing conflicts in accordance with these rules, and con- 
Struct a parser which realizes the desired precer *nces and associativities. 


The precedences and associalivities are atta.hed to tokens in the declarations section. 
This is done by a series of lines beginning with a Yucc keyword: “left, %right, or Ynonassvc, 
followed by a list of tokens. All of the tokens on the same line are assumed to have the sume 
precedence level and associativity; the lines are listed in order of increasing precedence or 
binding strength. Thas, 


%left “+ “— 
%left “* “/ 


describes the precedence and associativity of the four arithmetic operators. Plus and minus ure 
left associative, and have lower precedence than star and slash, which are also left associative. 
The keyword “%right is used to describe right associative operators, and the keyword %nonassoc 
is used to describe operators, like the operator .LT. in Fortran, which may not associate with 
themselves; thus, 


A .LT. BLT. C 


is illegal in Fortran, and such an operator would be described with the keyword “.noriassoc in 
Yacc. As an example of the behavior of these declarations, the description 


Yright “= 
%left “+ “— 
%left * “/ 


%o% 


expr: 
expr “= expr | 
expr +’ expr | 
expr “— expr | 
expr '*’ expr | 
expr ‘/" expr | 
NAME ; 


might be used to structure the input 
a=b=c*d—-—e — f*g 

as follows: 
a=(b =( ((c*d)—e) — (fg) ) ) 


When this mechanism is used, unary operators must, in general, be given a precedence. An 
interesting Situation arises when we have a unary operator and a binary operator which have 
the same symbolic representation, but different precedences. An example is unary and binary 
‘—"; frequently, unary minus is given the same strength as multiplication, or even higher, while 
binary minus has a lower Strength than multiplication. We can indicate this situation by use of 
another keyword, “oprec, to change the precedence level associated with a particular grammur 
rule. Y%prec appears immediately after the body of the grammar rule, before the action or clos- 
ing semicolon, and is followed by a token name or literal; it causes the precedence of the 
grammar rule to become that of the token name or literal. Thus, to make unary minus have 
the same precedence as multiplication, we might write: 
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“left +" “—’ 
Yleft “*" 


% % 0 


expr : 
expr ‘+ expr | 
expr ‘“—" expr | 
- expr ’*’ expr | 
expr ‘/ expr | 
‘“~’ expr %prec “* | 
NAME ; 


Notice that the precedences which are described by “left, %right, and %nonassoc are in- 
dependent of the declarations of token names by %token. A symbol can be declared by %to- 
ken, and, later in the declarations section, be given a precedence and associativity by one of 
the above methods. It is true, however, that names which are given a precedence or associa- 
livity are also declared to be token names, and so in general do not need to be declared by 
%token, although it does not hurt to do so. 


As we mentioned above, the precedences and associativities are used by Yacc to ee 
parsing conflicts; they give rise to disambiguating rules. Formally, the rules work as follows: 


1. The precedences and associativities are recorded for those tokens and literals which have 
them. ' 


2. A precedence and associativity is associated with each grammar rule; it is the precedence 
und associativity of the last token or literal in the body of the rule. If the %prec con- 
struction is used, it overrides this default. Notice that some grammar rules may have no 
precedence and associativity associaied with them. : 


3. When there is a reduce/reduce conflict, or there is a shift/reduce conflict and either the 
input symbol or the grammar rule, or both, has no precedence and associativity associated 
with it, then the two disambiguating rules given at the beginning of the section are used, 
and the conflicts are reported. 


4. ‘If there is a shift/reduce conflict, and both the grammar rule and the input character have 
precedence and associativity associated with them, then the conflict is resolved in favor of 
the action (shift or reduce) associated with the higher precedence. If the precedences are 
the same, then the associativity is used; left associative implies reduce, right associative 
implies shift, and nonassociating implies error. 


There are a number of points worth making about this use of disambiguation. There is 
no reporting of conflicts which are resolved by this mechanism, and these conflicts are not 
counted in the number of shift/reduce and reduce/reduce conflicts found in the grammar. 
This means that occasionally mistakes in the specification of precedences disguise errors in the 
input grammar; it is a good idea to be sparing with precedences, and use them in an essentially 
“cookbook” fashion, until some experience has been gained. Frequently, not enough operators 
or precedences have been specified; this leads to a number of messages about shift/reduce or 
reduce/reduce conflicts. The cure is usually to specify more precedences, or use the “prec 
mechanism, or both. It is generally good to examine the verbose output file to ensure that the 
conflicts which are being reported can be validly resolved by precedence. 
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Section 5: Error Handling 


Error handling is an extremely difficult area, and many of the problems are senuntc 
ones. When an error is found, for example. i may be necessary to reclaim parse (ree storaye, 
delete or alter symbol table entries, und, typically, set switches to avoid pulling out any further 
guipul. 


It is generally not acceptable to stop ull processing when an error is found; we wish to 
continue scanning the input to find any further syntax errors. This leads to the probiem of 
getting the parser “restarted” after an error. The general class ‘f algorithms to do this invoives 
reading ahead and discarding a number of tokens from the input string, and attempting to ad- 
just the parser so that input can continue. 


To allow the user some control over this process, Yacc provides a simple, but reasonuols 
general, feature. The token name “error” is reserved for error handling. This name can be 
used in grammar rules; in effect, it suggests places where errors are expected. und recovery 
might take place. The parser attempts to find the last time in the input when the special token 
‘error’ is permitted. The parser then behaves as though it saw the token name “error” us an 
input token, and attempts to parse according to the rule encountered. The token at which the 
error was detected remains the next input token after this error token is processed. I[f no spe- 
cial error rules have been specified, the processing effectively halts when an error is detected. 


In order to prevent a cascade of error messages, the parser assumes that, after detecting 
an error, it remains in error state until three tokens have been successfully read and shifted. If 
an error is detected when the parser is already in error state, no error message is given, and the 
input token is quietly deleted. 


As a common example, the user might include a rule of the form 
Statement : error; 


in his specification. This would, in effect, mean that on a syntax error the parser would at- 
lempt to skip over the statement in which the error was seen. (Notice, however, that it may 
be difficult or impossible to tell the end of a statement, depending on the other grammar 
rules). More precisely, the parser will scan ahead, looking for three tokens that might legally 
follow a statement, and start processing at the first of these; if the beginnings of statements are 
not sufficiently distinctive, it may make a false start in the middle of a statement, and end up 
reporting a second error where there is in fact no error. 


The user may supply actions after these special grammar rules, just as after the other 
grammar rules. These actions might attempt to reinitialize tables, reclaim symbol table spuce, 
etc. 


The above form of grammar rule is very general, but somewhat difficult to control. 
Somewhat easier to deal with are rules of the form 


statement: error :’ ;: 


Here, when there is an error, the parser will again attempt to skip over the statement, but tn 
this case will do so by skipping to the next ‘*;”. All tokens after the error and before the next 


> give syntax errors, and are discarded. When the “;”’ is seen, this rule will be reduced, and 
any “cleanup” action associated with it will be performed. 


Still another form of error rule arises in interactive applications, where we may wish to 
prompt the user who has incorrectly input a line, and allow him to reenter the line. In C we 
might write: 


“15 4: 


inputline: error ‘\n’ prompt inputline 
= { $$ = $4; }. 


prompt: /* matches no input */ 
= ( printf( "Reenter last line: * ); }: 


There is one difficulty with this approach; the parser must correctly process three input tokens 
before it is prepared to admit that it has correctly resynchronized after the error. Thus, if the 
reentered line contains errors in the first two tokens, the parser will simply delete the offending 
tokens, and give no message; this is clearly unacceptable. For this reason, there is a mechan- 
ism in both C and Ratfor which can be used to force the parser to believe that resynchroniza- 
tion has taken place. One need only include a statement of the form 


yyerrok ; 


in his action after such a grammar rule, and the desired effect will take place; this name will be 
expanded, using the ‘‘# define’ mechanism of C or the “define” mechanism of Ratfor, into an 
appropriate code sequence. For example, in the situation discussed above where we want to 
prompt the user to produce input, we probably want to consider that the original error has 
been recovered when we have thrown away the previous line, including the newline. In this 
case, we can reset the error state before putting out the prompt message. The grammar rule 
for the nonterminal symbol prompt becomes: 


prompt: /* matches no input */ 
= { 
yyerrok; 
printf( "Reenter last line: " ); 
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There is another special feature which the user may wish to use in error recovery. As 
mentioned above, the token seen immediately after the “error” symbol is the input token at 
which the error was discovered. Sometimes, this is seen to be inappropriate; for example, an 
error recovery action might take upon itself the job of finding the correct place to resume in- 
put. In this case, the user wishes a way of clearing the previous input token held in the parser. 
One need only include a statement of the form 


yyclearin ; 


in his action; again, this expands, in both C and Ratfor, to the appropriate code sequence. For 
example, suppose the action after error were to call some sophisticated resynchronization 
routine, supplied by the user, which attempted to advance the input to the beginning of the 
next valid statement. After this routine was called, the next token returned by yylex would 
presumably be the first token in a legal statement; we wish to throw away the oid, illegal to- 
ken, and reset the error state. We might do this by the sequence: 


Statement : error : 


resynch( ); 
yyerrok ; 
yyclearin ; 


. 
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These mechanisms are admittedly crude, but do allow for a simple, fairly effective 
recovery of the parser from many errors, and have the virtue that the user can get “handles” 
by which he can deal with the error actions required by the lexical and output portions of the 
sysiem. 
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Section 6C: The C Language Yace Environment 

The default mode of operation in Yace is to write actions and the lexical analyzer in C. 
This has a number of advantages; primarily, it is easier to write character handling routines, 
such as the lexical analyzer, in a language which s.,ports character-by-character I/O, and has 
shifting and masking operators. 

When the user inputs a specification to Yacc, the output is a file of C programs, called 
“y tab.c”. These are then compiled, and loaded with a library; the library has default versions 
of a number of useful routines. This section discusses these routines, and how the user can 
write his own routines if desired. The name of the Yacc library is system dependent; see Ap- 
pendix B. 


The subroutine produced by Yacc is called ‘“yyparse’; it is an integer valued function. 
When it is called, it in turn repeatedly calls “tyylex’’, the lexical analyzer supplied by the user 
(see Section 3), to obtain input tokens. Eventually, either an error is detected, in which case 
(if no error recovery is possible) yyparse returns the value 1, or the lexical analyzer returns the 
endmarker token (type number 0), and the parser accepts. In this case, yyparse returns the 
value 0. 


Three of the routines on the Yacc library are concerned with the “external” environment 
of yyparse. There is a default “main” program, a default “initialization” routine, and a default 
“accept” routine, respectively. They are so simple that they will be given here in their entire- 
ly: 

main( argc, argv ) 
int argc; 
char “argvl | 


yyinit( argc, argv ); 

if( yyparse( ) ) 
return; 

yyaccpt( ); 


yyinit() { | 


yyaccpt( ) { } 


By supplying his own versions of yyinit and/or yyaccpt, the user can get control either before 
the parser is called (to set options, open input files, etc.) or after the accept action has been 
done (to close files, call the next pass of the compiler, etc.). Note that yyinit is called with the 
two “command line” arguments which have been passed into the main program. If neither of 
these routines is redefined, the default situation simply looks like a call to the parser, followed 
by the termination of the program. Of course, in many cases the user will wish to supply his 
own main program: for example, this is necessary if the parser is to be called more than once. 


The other major routine on the library is called ‘“‘yyerror’; its main purpose is to write 
Out a message when a syntax error is detected. [t has a number of hooks and handles which 
attempt to make this error message general and easy to understand. This routine is somewhat 
more complex, but still approachable: 
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extern int yyline; /* input line number */ 


yyerror(s) 

Char “s; 

{ 
extern int yychar; 
extern char “yysterm{ ]; 


printf("\n%s", s ); 
if( yyline ) 
printf(", line %d,", yyline ); 
printf(" on input: "): 
if( yychar >= 0400 ) 
printf("%s\n", yysterml[yychar—0400] ); 
else switch ( yychar ) { 
case ‘\t’: printf( \\t\n" ); return; 
case “\n’: printf( “\\n\n" ); return: 
case \0": printf( “Send\n" ); return; 
default: printf( “%c\n" , yychar ); return; 


The argument to yyerror is a string containing an error message; most usually, it is “syntax er- 
ror’. yyerror also uses the external variables yyline, yychar, and yysterm. yyline is a line 
number which, if set by the user to a nonzero number, will be printed out as part of the error 
message. yychar is a variable which contains the type number of the current token. yysterm 
has the names, supplied by the user, for all the tokens which have names. Thus, the routine 
spends most of its time trying to print out a reasonable name for the input token. The biggest — 
problem with the routine as given is that, on Unix, the error message does not go out on the 
error file (file 2). This is hard to arrange in such a way that it works with both the portable 1/O 
library and the system I/O library; if a way can be worked out, the routine will be changed to 
do this. Beware: This routine will not work if any token names have been given redefined type 
numbers. In this case, the user must supply his own yyerror routine. Hopefully, this 
‘feature’ will disappear soon. 


Finally, there is another feature which the C user of Yacc might wish to use. The integer 
variable yydebug is normally set to 0. If it is set to 1, the parser will output a verbose descrip- 
lion of its actions, including a discussion of which input symbols have been read, and what the 
parser actions are. Depending on the operating environment, it may be possible to set this 
variable by using a debugging system. 


Section 6R: The Ratfor Language Yacc Environment 


For reasons of portability or compatibility with existing software, it may be desired to use 
Yace to generate parsers in Ratfor, or, by extension, in portable Fortran. The user is likely to 
work considerably harder doing this than he might if he were to use C. 


When the user inputs a specification to Yacc, and specifies the Ratfor option (see Appen- 
dix B), the output is a file of Ratfor programs called ‘“‘y.tab.r’”. These programs are then com- 
piled, and provide the desired subroutine. 


The subroutine produced by Yacc which does the input process is an integer function 
called “‘yypars’’.. When it is called, it in turn repeatedly calls ‘““yylex”, the lexical analyzer sup- 
plied by the user (see Section 3). Eventually, either an error is detected, in which case (if no 
error recovery is possible) yypars returns the value 1, or the lexical analyzer returns the end- 
marker (type number 0), and the parser accepts. In this case, yypars returns 0. 
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Unlike the © program situation (see Section 6C) there is no library of Ratfor routines 
which must be used in the loading process. As a side effeet of this, Me user musc supply aman 
program which calls yvpars. A suggested Ratfor main program is 


integer yypurs 
n = yypars(0) 
if(n .EQ.0) { 
... here if the program accepted 
} else [ me 
here if there were unrecoverable errors 


end 


Notice that there is no easy way for the user to get control when an error is detected, since the 
Fortran language provides only a very crude character string capability. 


There is another feature which the Ratfor user might wish to use. The argument to 
yypars is normally 0. If it is set to 1, the parser will output a verbose description of its uctions, 
including a discussion of which input symbols have been read, and what the parser actions are. 
During the input process, the value of this debug flag is kept in a common variable vydebu, 
which is available to the actions and may be set and reset at will. 


Statement labels | through 1000 are reserved for the parser, and may not appear in ac- 
tions; note that, because Ratfor has a more modern control structure than Fortran, it is rarely 
necessary to use statement labels at all; the most frequent use of labels in Ratfor is in format- 
ted [/0. 


Because Fortran has no standard character set and not even a standard character width, it 
is difficult to produce a lexical analyzer in portable Fortran The usual solution is to provide a 
routine which does a table search to get the internal type number for each input character, 
with the understanding that such a routine can be recoded to run far faster for any particular 
machine. 


Finally, we must warn the user that the Ratfor feature of Yacc has been operational for a 
much shorter time than the other portions of the system. If past experience is any guide, the 
Ratfor support will develop and become more powerful and better human engineered in 
response to user complaints and requirements. Thus, the potential Ratfor user might do well 
to contact the author to discuss his own particular needs. 


Section 7. Hints for Preparing Specifications 


This section contains miscellaneous hints on preparing efficient, easy to change, and clear 
specifications. The individual subsections are, more or less, independent; the reader seeing 
Yacc for the first time may well find that this entire section could be omitted. 


Input Style 


It is difficult to input rules with substantial actions and still have a readable specification 
file. The following style hints owe much to Brian Kernighan, and are officially endorsed by the 
author. 


a. Use all capital letters for token names, all lower case letters for nonterminal names. This 
rule comes under the heading of “knowing who to blame when things go wrong.* 


b. Put grammar rules and actions on separate lines. This allows either to be changed 
without an automatic need to change the other. 
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c. Put all rules with the same left hand side together. Put the left hand side in only once, 
and let all following rules begin with a vertical bar. 
d. Indent rule bodies by one tab stop, and action bodies by two tad stops. 


The example in Appendix A is written following this style, as are the examples in the 
text of this paper (where space permits). The user must make up his own mind about these 
stylistic questions; the central problem, however, is to make the rules visible through the 
morass of action code. 


Common Actions 
When several grammar rules have the same action, the user might well wish to provide 
only one code sequence. A simple, general mechanism is, of course, to use subroutine cails. It 
is also possible to put a label on the first statement of an action, and let other actions be simply 
a goto to this label. Thus, if the user had a routine which built trees, he might wish to have 
only one call to it, as follows: 
expr : 
expr + expr =. 
{ binary: 
$$ = btree{ $1, $2, $3 ); 


expr — expr = 
goto binary; 
expr * expr = 


goto binary; 


} 


Left Recursion 


The algorithm used by the Yacc parser encourages so called “left recursive’ grammar 
rules: rules of the form 


name : name rest_of_rule ; 


These rules frequently arise when writing specifications of sequences and lists: 


list : 
item | 
list ’,’ item ;: 
and 
sequence : 
item | 


sequence item ; 
Notice that, in each of these cases, the first rule will be reduced for the first item only, and the 
second rule will be reduced for the second and all succeeding items. 
If the user were to write these rules right recursively, such as 
sequence : 


item | 
item sequence ; 
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the parser would be a bit bigger, and the items would be seen, and reduced, from right to left. 
More seriously, an internal stack in the parser would ve in danger of overflowing if a very long 
sequence were read. Thus, the user should use left recursion wherever reasonable. 


The user should also consider whether a sequence with zero elements has any meaning, 
and if so, consider writing the sequence specification with an empty rule: 


sequence : 
| /* empty */ 
sequence item ; 


Once again, the first rule would always be reduced exactly once, before the first item was read, 
and then the second rule would be reduced once for each item read. Experience suggests that 
permitting empty sequences leads to increased generality, which frequently is not evident at 
the time the rule is first written. There are cases, however, when the Yacc algorithm can fail 
when such a change is made. In effect, conflicts might arise when Yacc is asked to decide 
which empty sequence it has seen, when it hasn't seen enough to know! Nevertheless, this 
principle is still worth following wherever possible. 


Lexical Tie-ins 


Frequently, there are lexical decisions which depend on the presence of various construc- 
lions in the specification. For example, the lexical analyzer might want to delete blanks nor- 
mally, but not within quoted strings. Or names might be entered into a symbol table in de- 
clarations, but not in expressions. 


One way of handling these situations is to create a global flag which is examined by the 
lexical analyzer, and set by actions. For example, consider a situation where we have a pro- 
gram which consists of 0 or more declarations, followed by 0 or more statements. We declare a 
flag called ‘“‘dflag’’, which is | during declarations, and 0 during statements. We may do this as 
follows: 


% { 
int dflag ; 
% | 
H% 
program : 
decis stats ; 
decis : 
= /* empty */ 
dflag = |; 
} | 
decis declaration ; 
Stats : 
= /* empty */ 
{ 
dflag = 0; 


Stats Statement ; 


. other rules... 


The flag dflag is now set to zero when reading statements, and 1 when reading declarations, ex- 
cept for the first token in the first statement. This token must be seen by the parser before it can 
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tell that the declaration section has ended and the statements have begun. Frequently, howev- 
er, this single token exception does not affect the lexical scan required. 


Clearly, this kind of “backdoor” approach can be elaborated on to a noxious degree. 
Nevertheless, it represents a way of doing some things that are difficult, if not impossible, to do 
otherwise. 


Bundling 


Bundling is a technique for collecting together various character strings so that they can 
be output at some later time. It is derived from a feature of the same name in the 
compiler/compiler TMG [6]. 


Bundling has two components — a nice user interface, and a clever implementation trick. 
They will be discussed in that order. 


The user interface consists of two routines, “bundle” and “bprint”’. 
bundle( al, a2,..., an ) 


accepts a variable number of arguments which are either character strings or bundles, and re- 
turns a bundle, whose value will be the concatenation of the values of al, ..., an. 


bprint( b ) 


accepts a bundle as argument and outputs its value. 


For example, suppose that we wish to read arithmetic expressions, and output function 
calls to routines called ‘“tadd’’, “sub’’, “‘mul’’, “div”, and “‘assign’’. Thus, we wish to translate 


a=b—c*d 
imto 
assign (a,sub(b,mul(c,d))) 


A Yacc specification file which does this is given in Appendix D; this includes an imple- 
mentation of the bundle and bprint routines. A rule and action of the form 


expr: 
expr + expr = 


$$ = bundle( "add(", $1, ",", $3, ")" ): 


causes the returned value of expr to be come a bundle, whose value is the character string con- 
taining the desired function call. Each NAME token has a value which is a pointer to the ac- 
tual name which has been read. Finally, when the entire input line has been read and on 
value has been bundled, the value is written out ane the bundles and names are cleared, 
preparation for the next input line. 


Bundles are implemented as arrays of pointers, terminated by a zero pointer. Each 
pointer either points to a bundle or to a character string. There is an array, called dundle space, 
which contains all the bundles. 


The implementation trick is to check the values of the pointers in bundles — if the 
pointer points into bundle space, it is assumed to point to a bundle; otherwise it is assumed to 
point to a character string. 


The treatment of functions with a variable number of arguments, like bundle, is likely to 
differ from one implementation of C to another. 


oe 


In general, «ne may wish to have a simple s*orage allocator which allocates and [rees 
bundles, in order tu handle situations where it is not appropriate to completely clear all of bun- 
dle space af one time. 


Reserved Words 


Some programming languages permit the user to use words like “if’, which are normally 
reserved, as label or variable names, provided that such use does not conflict with the legal use 
of these names in the programming language. This is extremely hard to do in the framework 
of Yacc, since it is difficult to pass the required information to .“e lexical analyzer which tells it 
“this instance of if is a keyword, and that instance is a variable’. The user can muke a Stab at 
it, using the mechanism described in the last subsection, but it is difficult. 


A number of ways of making this easier are under advisement, and one will probably be 
supported eventually. Until this day comes, [ suggest that the keywords be reserved: that is, be 
forbidden for use as variable names. There are powerful stylistic reasons for preferring this, 
anyway (he said weakly... ). 


Non-integer Values 


Frequently, the user wishes to have values which are bigger than integers; again, this is 
an area where Yacc does not make the job as easy as it might, and some additional support is 
likely. Nevertheless, at the cost of writing a storage manager, the user can return pointers or 
indices to blocks of storage big enough to contain the full values desired. 


Previous Work 


There have been many previous applications of Yacc. The user who is contemplating a 
big application might well find that others have developed relevant techniques, or even por- 
lions of grammars. Yacc specifications appear to be easier to change than the equivalent com- 
puter programs, so that the “prior art’ is more relevant here,as well. 


Section 8: User Experience, Summary, and Acknowledgements 


Yacc has been used in the construction of a C compiler for the Honeywell 6000, a system 
for typesetting mathematical equations, a low level implementation language for the PDP 11, 
APL and Basic compilers to run under the UNIX system, and a number of other applications. 


To summarize, Yace can be used to construct parsers; these parsers can interact in a fair- 
ly flexible way with the lexical analysis and output phases of a larger system. The system also 
provides an indication of ambiguities in the specification, and allows disambiguating rules to be 
supplied to resolve these ambiguities. 


Because the output of Yacc is largely tables, the system is relatively language indepen- 
dent. In the presence of reasonable applications, Yacc could be modified or adapted to produce 
subroutines for other machines and languages. In addition, we continue to seek better algo- 
rithms to improve the lexical analysis and code generation phases of compilers produced using 
Yacc. 


This document would be incomplete if I did not give credit to a most stimulating collec- 
tion of users, who have goaded me beyond my inclination, and frequently beyond my ability, 
in their endless search for “one more feature”. Their irritating unwillingness to learn how to 
do things my way has usually led to my doing things their way; most of the time, they huve 
been right. B. W. Kernighan, P. J. Plauger, S. I. Feldman, C. Imagna, M. E. Lesk, and A. 
Snyder will recognize some of their ideas in the current version of Yacc. Al Aho also deserves 
recognition for bringing the mountain to Mohammed, and other favors. 


/ 
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Appendix A: A Simple Example 

This example gives the complete Yacc specification for a small desk calculator, the desk 
calculator has 26 registers, labeled a through z, and accepts arithmetic expressions made up of 
the operators +, —, *, /, % (mod operator), & (uitwise and), | (bitwise or), and assignment. If 
an expression is an assignment at the top level, the value is not printed; otherwise it is. As in 
C, an integer which begins with 0 (zero) is assumed to be octal; otherwise, it is assumed to be 
decimal. 


As an example of a Yacc specification, the desk calculator does a reasonable job of show- 
ing the way that precedences and ambiguities are used, as well as showing how simple errcr 
recovery operates. The major oversimplifications are that the lexical analysis phase is much 
simpler than for most applications, and the output is produced immediately, line by line. Note 
the way that decimal and octal integers are read in by the grammar rules; frequently, this job is | 
better done by the lexical analyzer. 


%token DIGIT LETTER  #/* these are token names */ 
%left T /* declarations of operator precedences */ 
left “&’ 
%left “+ “— 
left “°° “/ °%’ . 
%left UMINUS /* supplies precedence for unary mipus */ 
% | /* declarations used by the actions */ 

int base; 

int regs(26]: 


td 


‘*% /* beginning of rules section */ 


list : /* list is the start symbol */ 
| /* empty */ 
list stat “\n‘ | 
list error ‘\n’ = 


yyerrok . 


Stat : 
expr = 


{ 
printf("%d\n", $1) ; 


LETTER ‘=’ expr = 
regs($1] = $3 ; 


expr : 
ae expr *y’ = 


$$ = $2: 
} | 


expr + expr = 

$$ = $1 + $3; 
expr ‘“—" expr = 

$$ = $1 — $3; 
expr * expr =. 

$3 = $1 °* $3; 
expr ‘/" expr = 

$$ = $1 / 33; 
expr % expr = 

$3 = $1 % $3 ; 
expr & expr 

$$ = $1 & 33; 
expr |’ expr 
| $$ = $1| $3: 
‘—" expr %prec UMINUS 
| $$ = — $2; 
ries 

$$ = regs($1] ; 
number ; 


number : 
DIGIT = 
{ 


$$ = 31; 
base = 10; 
if( $1 ==0) 
base = 8; 
}| 
— DIGIT = -° 


. $$ = base * $1 + $2; 


%% /* start of programs */ 
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yylex( ) /* lexical analysis routine */ 
/* returns LETTER for a lower case letter, yylval = 0 through 25 °/ 
/* return DIGIT for a digit, yyival = 0 thre: ch 9 °/ 
/* all other characters are returned immediately */ 
intc; 
while{ (c=getchar( )) ==” ) 
if(c >=‘a &&c <=’z') { = 
yylval =c —- ‘a’; 
return( LETTER ) ; 
if(e >='0 &&c <='9’ ) { 
yylval =c—- ‘0’; 
return( DIGIT ) ; 


return(c) ; 
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Appendix B: Use of Yacc on Unix 
Suppose that the Yacc specification is on a file called yfile. If the actions are in C, Yacc is 
invoked by 
yacc yfile 


The output appears on file y.tab.c To compile the parser and load it with the Yacc library, use 
the command 


ce y.tab.c’ —ly 
If Yacc is invoked with the option —v: 
yacc —v yfile 
a verbose description of the parser is produced on file y.output. The C user should consult sec- 
tion 6C for more information about the run time environment. 
If the actions are in Ratfor, the user should invoke Yacc with the option —r: 
yacc —r yfile 
The Ratfor output appears on file y.tab.r It may be compiled by 
rc —2 y.tab.r | 
Note that when Yacc is used to produce Ratfor programs, there is no need to load these pro- 
grams with any library. 
If the ~—v action ts also invoked: 


yacc —rv yfile 


a verbose description of the parser is produced on file y.output. The Ratfor user should consult 
section 6R for more information about the run time environment. 


Appendix C: Old Features Supported but not Encouraged 


This appendix mentions synonyms and features which are supported for historical con- 
tinuity, but, for various reasons, are not encouraged. 


1. Literals may be delimited by double quotes ‘*“”’ as weil as single quotes 


2. Literals may be more than one character long. If all the characters are alphabetic, 
numeric, or _, the type number of the literal is defined, just as if the literal did not have 
the quotes around it. Otherwise, it is difficult to find the value for such literals. 


The use of multi-character literals is likely to mislead those unfamiliar with Yacc, since it 
Suggests that Yacc is doing a job which must be actually done by the lexical analyzer. 


3. Most places where % is legal, backslash ‘‘\" may be used. In particular, \\ is the same as 
%%, \left the same as %left, etc. 


4. There are a number of other synonyms: 


479 


%< is the same as “left 

%> is the same as %right 

%binary and %2 are the same as %nonassoc 
%0 and %term are the same as %token 

%= is the same as “sprec 


5. The curly braces “*{” and “}” around an action are optional if the action consists of a sin- 
gle C statement. (They are always required in Ratfor). 
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Appendix D: An Example of Bundling 
The following program is an example of the technique of bundling; this example is dis- 
cussed in Section 7. 


/* warnings: 
|. This works on Unix; the handling of functions with a variable number, of arguments is 
different on different systems. 


2. A number of checks for array bounds have been left out to avoid obscuring the basic 
ideas, but should be there in a practical program. 


*/ 


%token NAME 
%right “= 
%left “+ “— 
%left “* “7 

%% 


lines : 
= /* empty */ 


bclear( ) ; 
lines expr “\n” = 
bprint( $2 ) ; 
printf( “\n" ) ; 
bclear( ) ; 


lines error ‘\n’ = 


bclear( ) ; 
yyerrok; 


expr : 
expr + expr = 


$$ = bundle( "add(", $1, *,", $3, ")" ); 
expr — expr = 
$$ = bundle( “sub(", $1, ",", $3, ")" ); 


* 


expr * expr = 


i $$ = bundie( "mul(", $1, ",", $3, ")" ); 
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capr / exp, 


| $$ = bundle( "div(", $1, *,", $3, ")"); 
‘(expr ‘)’ = 
{ 
} | 
NAME ‘=’ expr = 

$$ = bundie( “assign(", $1, ",", $3, *)" ); 
} | 


NAME : 


S$ = $2: 


%o%o 


#define nsize 200 
char names(nsize], *nptr { names }: 


#define bsize 500 
int bspacel[bsize], *bptr { bspace }; 


yylex( ) 
int c; 


c = getchar( ); 
while(c ==" " ) 
= getchar( ); 
iM(c>=a && c<=2' ) { 
yyival = nptr; 
for(; c>=a’ && c<='2'; c=getchar( ) ) 
*nptr++ =c; 
ungetc(c ); 
*nptr++ = ‘\0’; 
| return( NAME ); 
return(c ); 


bclear( ) 


nptr = names; 

bptr = bspace; 
bundle( al,a2,a3,a4,a5 ) 
{ 

int i,j, *p, “obp; 


p= dal; 
i = nargs( ); 


' -30- 


obp = bptr; 


for( j=0; j <i; +4 ) 
*bptr++ = *p++; 

*botr++ = 0; 

return( obp ); 


bprint( p ) 
int “p; 


( 


if( p>=bspace && p< &bspace[bsize] ) /* bundle */ 
while( *p '= 0 ) 
bprint( *p++ ); 
else printf( “%s", p); 
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Lex heips write programs whose control flow is directed by instances of regular expressions in the in- 
put stream. It is well suited for editor-script type transformations and for segmenting input in prepara- 
tion for a parsing routine. 

Lex source is a table of regular expressions and corresponding program fragments. The tabdle is 
transiated to a program which reads an input stream, copying it to an output stream and partitioning the 
input into strings which match the given expressions. As each such string is recognized the correspond- 
ing program fragment is executed. The recognition of the expressions is performed by a deterministic 
finite automaton generated by Lex. The program fragments written by the user are executed in the ord- 
er in which the corresponding regular expressions occur in the input stream. 

The lexical analysis programs written with Lex accept ambiguous specifications and choose the longest 
match possible at each input point. If necessary, substantial lookahead is performed on the input, but 
the input stream will be backed up to the end of the current partition, so that the user has general free- 
dom to manipulate it. 

Lex can be used to generate analyzers in either C or Ratfor, a language which can be translated au- 
tomatically to portable Fortran. It is available on the PDP-11 UNIX, Honeywell GCOS, and IBM OS 
systems. Lex is designed to simplify interfacing with Yacc, for those with access to this compiler- 
compiler system. 
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1 Introduction. 


Lex is a program generator designed for lexical process- 
ing of character input streams. It accepts a high-level, 
problem oriented specification for character string match- 
ing, and produces a program in a general purpose 
language which recognizes regular expressions. The regu- 
lar expressions are specified by the user in the source 
specifications given to Lex. The Lex written code recog- 
nizes these expressions in an input stream and partitions 
the input stream into strings matching the expressions. 
At the boundaries between strings program sections pro- 
vided by the user are executed. The Lex source file asso- 


ciates the regular expressions and the program fragments. 
As each expression appears in the input to the program 
written by Lex, the corresponding fragment is executed. 
The user supplies the additional code beyond expres- 
sion matching needed to complete his tasks, possibly in- 
cluding code written by other generators. The program 
that recognizes the expressions is generated in the general 
purpose programming language employed for the user’s 
program fragments. Thus, a high level expression 
language is provided to write the string expressions to be 
matched while the user's freedom to write actions !s 
unimpaired. This avoids forcing the user who wishes to 
use a string manipulation language for input analysis to 
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write processing programs in the same and often inap- 
propriate string handling language. 

Lex is not a complete language, but rather a generator 
representing a new language feature which can be added 
to different programming languages, called ‘‘host 
languages.’’ Just as general purpose languages can pro- 
duce code to run on different computer hardware, Lex 
can write code in different host languages. The host 
language is used for the output code generated by Lex 
and also for the program fragments added by the user. 
Compatible run-time libraries for the different host 
languages are also provided. This makes Lex adaptable to 
different environments and different users. Each applica- 
tion may be directed to the combination of hardware and 
host language appropriate to the task, the user’s back- 
ground, and the properties of local implementations. At 
present there are only two host languages, C[1] and For- 
tran (in the form of the Ratfor language{2]). Lex itself 
exists on UNIX, GCOS, and OS/370; but the code gen- 
erated by Lex may be taken anywhere the appropriate 
compilers exist. 

Lex turns the user’s expressions and actions (cailed 
source in this memo) into the host general-purpose 
language; the generated program is named yylex. The 
yylex program will recognize expressions in a stream 
(called input in this memo) and perform the specified ac- 
tions for each expression as it is detected. See Figure 1. 

For a trivial example, consider a program to delete 
from the input all blanks or tabs at the ends of lines. 


YH % 
[\tl+$  ; 


is all that is required. The program contains a %% delim- 
iter to mark the beginning of the rules, and one rule. 
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This rule contains a regular expression which matches 
one or more instances of the characters blank or tab 
(written \t for visibility, in accordance with the C 
language convention) just prior to the end of a line. The 
brackets indicate the character class made of blank and 
iab; the + indicates ‘‘one or more ...’*; and the $ indi- 
cates ‘tend of line,’’ as in QED. No action is specified, so 
the program generated by Lex (yylex) will ignore these 
characters. Everything else will be copied. To change any 
remaining string of blanks or tabs to a single blank, add 
another rule: 


% My 
(\d+s ; 
{ \t}+ printf (" "); 


The finite automaton generated for this source will scan 
for both rules at once, observing at the termination of the 
string of blanks or tabs whether or not there is a newline 
character, and executing the desired rule action. The first 
rule matches all strings of blanks or tabs at the end of 
lines, and the second rule all remaining strings of blanks 
or tabs. 


Lex can be used alone for simple transformations, or 
for analysis and statistics gathering on a lexical level. Lex 
can also be used with a parser generator to perform the 
lexical analysis phase, it is particularly easy to interface 
Lex and Yacc [3]. Lex programs recognize only regular 
expressions; Yacc writes parsers that accept a large class 
of context free grammars, but require a lower level 
analyzer to recognize input tokens. Thus, a combination 
of Lex and Yacc is often appropriate. When used as a 
preprocessor for a later parser generator, Lex is used to 
partition the input stream, and the parser generator as- 
signs structure to the resulting pieces. The flow of con- 
trol in such a case (which might be the first half of a 
compiler, for example) is shown in Figure 2. Additional 
programs, written by other generators or by hand, can be 
added easily to programs written by Lex. Yacc users will 
realize that the name yylex is what Yacc expects its lexical 
analyzer to be named, so that the use of this name by 
Lex simplifies interfacing. 


Lex generates a deterministic finite automaton from the 
regular expressions in the source [4]. The automaton is 
interpreted, rather than compiled, in order to save space. 
The result is still a fast analyzer. In particular, the time 
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taken by a Lex program to recognize and partition an in- 
put stream is proportional to the length of the input. The 
number of Lex rules or the complexity of the rules is not 
important in determining speed, unless rules which in- 
clude forward context require a significant amount of re- 
scanning. What does increase with the number and com- 
plexity of rules is the size of the finite automaton, and 
therefore the size of the program generated by Lex. 

In the program written by Lex, the user’s fragments 
(representing the actions to be performed as each regular 
expression is found) are gathered as cases of a switch (in 
C) or branches of a computed GOTO (in Ratfor). The 
automaton interpreter directs the control flow. Opportun- 
ity is provided for the user to insert either declarations or 
additional statements in the routine containing the ac- 
tions, or to add subroutines outside this action routine. 

Lex is not limited to source which can be interpreted 
on the basis of one character lookahead. For example, if 
there are two rules, one looking for a4 and another for 
abcdefg, and the input stream is abcdefh, Lex will recog- 
nize @6 and leave the input pointer just before cd. . . 
Such backup is more costly than the processing of simpler 
languages. 


2 Lex Source. 
The general format of Lex source is: 


{definitions} 

%% 

{rules} 

%% 

{user subroutines} 


where the definitions and the user subroutines are often 
omitted. The second %% is optional, but the first is re- 
quired to mark the beginning of the rules. The absoiute 
minimum Lex program is thus 


%% 


(no definitions, no rules) which translates into a program 
which copies the input to the output unchanged. 

In the outline of Lex programs shown above, the ru/es 
represent the user’s control decisions; they are a table, in 
which the left column contains regular expressions (see 
section 3) and the right column contains actions, program 
fragments to be executed when the expressions are recog- 
nized. Thus an individual rule might appear 

integer _printf("found keyword INT"); 
to look for the string integer in the inpul stream and print 
the message ‘“‘found keyword INT" whenever it appears. 
In this example the host procedural language is C and the 
C library function printf is used to print the string. The 
end of the expression is indicated by the first blank or tab 
character. If the action is merely a single C expression, it 
can just be given on the right side of the line; if it is com- 
pound, or takes more than a line, it should be enclosed in 
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braces. As a slightly more useful example, suppose it is 
desired to change a number of words from British to 
American spelling. Lex rules such as 


colour printf("color"). 
mechanise  printf("mechanize"): 
petrol printf ("gas"); 


would be a start. These rules are not quite enough, since 
the word petroleum would become gaseurm, a way of deal- 
ing with this will be described later. 


3 Lex Regular Expressions. 


The definitions of regular expressions are very similar 
to those in QED [5]. A regular expression specifies a set 
of strings to be matched. It contains text characters 
(which match the corresponding characters in the strings 
being compared) and operator characters (which specify 
repetitions, choices, and other features). The letters of 
the alphabet and the digits are always text characters; thus 
the regular expression 


integer 


matches the string integer wherever it appears and the ex- 
pression 


a57D 


looks for the string a57D. 
Operators. The operator characters are 
"\(}*2.*+]0)$/(}% < > 
and if they are to be used as text characters, an escape 
Should be used. The quotation mark operator (") indi- 


cates that whatever is contained between a pair of quotes 
is to be taken as text characters. Thus 


xyz" + +" 


matches the string xyz+ + when it appears. Note that a 
part of a string may be quoted. It is harmiess but un- 
necessary to quote an ordinary text character; the expres- 
sion 


"xyz +" 


is the same as the one above. Thus by quoting every 
non-alphanumeric character being used as a text charac- 
ter, the user can avoid remembering the list above of 
current operator characters, and is safe should further ex- 
tensions to Lex lengthen the list. 

An operator character may also be turned into a text 
character by preceding it with \ as in 


xyz\+\+ 


which is another, less readable, equivalent of the above 


expressions. Another use of the quoting mechanism is to 
get a blank into an expression; normally, as explained 
above, blanks or tabs end a rule. Any blank character not 
contained within [] (see below) must be quoted. The 
usual C escapes with \ are recognized: \n is newline, \t is 
tab, \r is return, and \b is backspace. To enter \ itself, 
use \\. Since newline is illegal in an expression, \n must 
be used: it is not required to escape tab and backspace. 
Every character but blank, tab, newline and the list above 
is always a text character. 

‘Character classes. Classes of characters can be 
specified using the operator pair []. The construction 
[ab] matches a single character, which may be a, 6, or c. 
Within square brackets, most operator meanings are ig- 
nored. Only three characters are special: these are \ — 
and ©. The — character indicates ranges. For example, 


{a—z0-—9< > _] 


indicates the character class containing all the lower case 
letters, the digits, the angle brackets, and underline. 
Ranges may be given in either order. Using — between 
any pair of characters which are not both upper case 
letters, both lower case letters, or both digits is imple- 
mentation dependent and will get a warning message. 
(E.g., [0-z] in ASCII is many more characters than it is in 
EBCDIC). If it is desired to include the character — in a 
character class, it should be first or last; thus 


[—~+0—9] 


matches all the digits and the two signs. 

In character classes, the ~ operator must appear as the 
first character after the left bracket; it indicates that the 
resulting string is to be complemented with respect to the 
computer character set. Thus 


[“abc] 


matches all characters except a, b, or c, including all spe- 
cial or control characters; or 


(“a-zA-Z] 


is any character which is not a letter. The \ character pro- 
vides the usual escapes within character class brackets. 

Arbitrary character. To match almost any character, 
the operator character 


is the class of all characters except newline. Escaping into 
octal is possible although non-portable: 


{\40-\176] 


matches all printable characters in the ASCII character 
set, from octal 40 (blank) to octal 176 (tilde). 

Optional expressions. The operator ? indicates an op- 
tional element of an expression. Thus 
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ab?c 


iaatches either ac or adc. 
Repeated expressions. Repetitions of classes are indicat- 
ed by the operators « and +. 


qe 


is any number of consecutive a characters, including zero. 
while 


a+ 
is one or more instances of a. For example, 
[a-z] + 
is all strings of lower case letters. And 
[A—Za—z] [A—Za—z0—9]« 


indicates all alphanumeric strings with a leading alphabetic 
character. This is a typical expression for recognizing 
identifiers in computer languages. | 

Alternation and Grouping. The operator | indicates 
alternation: 


(ab|cd) 


matches either ad or cd. Note that parentheses are used 
for grouping, although they are not necessary on the out- 
side level. 


ab|cd 


would have sufficed. Parentheses can be used for more 
complex expressions: 


(abled +)? (ef) 


matches such strings as adbefef, efefef, cdef, or cddd: but 
not abc, abcd, or abcdef 

Context sensitivity. Lex will recognize a small amount 
of surrounding context. The two simplest operators for 
this are “ and § If the first character of an expression is 
“, the expression will only be matched at the beginning of 
a line (after a newline character, or at the beginning of 
the input stream). This can never conflict with the other 
meaning of ~, compiementation of character classes, since 
that only applies within the [] operators. If the very last 
character is S$, the expression will only be matched at the 
end of a line (when immediately followed by newline). 
The latter operator is a special case of the / operator char- 
acter, which indicates trailing context. The expression 


ab/cd 


matches the string ad, but only if followed by cd. Thus 


ab$ 
is the same as 
ab/\n 


Left context is handled in Lex by starr conditions as ex- 
plained in section 10. If a rule is only to be executed 
when the Lex automaton interpreter is in start condition 
x, the rule should be prefixed by 


<x> 


using the angle bracket operator characters. If we con- 
sidered ‘‘being at the beginning of a line’’ to be start con- 
dition ONE, then the ~ operator would be equivalent to 


<ONE> 


Start conditions are explained more fully later. 

Repetitions and Definitions. The operators (} specify ei- 
ther repetitions (if they enclose numbers) or definition 
expansion (if they enclose a name). For example 


(digit) 


looks for a predefined string named digit and inserts it at 
that point in the expression. The definitions are given in 
the first part of the Lex input, before the rules. In con- 
trast, 


a{1,5} 


looks for 1 to 5 occurrences of a. 
Finally, initial % is special, being the separator for Lex 
source segments. 


4 Lex Actions. 


When an expression written as above is matched, Lex 
executes the corresponding action. This section describes 
some features of Lex which aid in writing actions. Note 
that there is a default action, which consists of copying 
the input to the output. This is performed on all strings 
not otherwise matched. Thus the Lex user who wishes to 
absorb the entire input, without producing any output, 
must provide rules to match everything. When Lex is be- 
ing used with Yacc, this is the normal situation. One may 
consider that actions are what is done instead of copying 
the input to the output; thus, in general, a rule which 
merely copies can be omitted. Also, a character combina- 
tion which is omitted from the rules and which appears as 
input ts likely to be printed on the output, thus calling at- 
tention to the gap in the rules. 

One of the simplest things that can be done is to ignore 
the input. Specifying a C null statement, ; as an action 
causes this result. A frequent rule is 


[\t\n] 
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with the same result, although in different style. 


which causes the three spacing characters (blank, tab, and 
newline) to be ignored. 

Another easy way to avoid writing actions is the action 
character |, which indicates that the action for this rule is 
the action fo. the next rule. The previous example could 
also have been written 


oN 


vn" 


The 
quotes around \n and \t are not required. 

In more complex actions, the user will often want to 
know the actual text that matched some expression like 
[a—z]+. Lex leaves this text in an external character ar- 
ray named yytext. Thus, to print the name found, a rule 
like 

{a-z]+  printf("%s", yytext)- 
will print the string in yytext. The C function printf ac- 
cepts a format argument and data to be printed: in this 
case, the format is ‘“‘print string’? (% indicating data 
conversion, and s indicating string type), and the data are 
the characters in yytext. So this just places the matched 
string on the output. This action is so common that it 
may be written as ECHO: 
{a-z]+ ECHO: 

is the same as the above. Since the default action is just 
to print the characters found, one might ask why give a 
rule, like this one, which merely specifies the default ac- 
tion? Such rules are often required to avoid matching 
some other rule which is not desired. For example, if 
there is a rule which matches read it will normally match 
the instances of read contained in bread or readjust, to 
avoid this, a rule of the form fa—z/+ is needed. This is 
explained further below. 

Sometimes it is more convenient to know the end of 
what has been found: hence Lex also provides a count 
yyleng of the number of characters matched. To count 
both the number of words and the number of characters 
in words in the input, the user might write 

(a-zA-Z]+ {words++. chars += yyleng:} 
which accumulates in chars the number of characters in 
the words recognized. The last character in the string 
matched can be accessed by 


yytextlyyleng- |] 
in C or 
yytext(yyleng) 


in Ratfor. 


Occasionally, a Lex action may decide that a rule has 
not recognized the correct span of characters. Two rou- 
tines are provided to aid with this situation. First, 
yymore() can be called to indicate that the next input ex- 
pression recognized is to be tacked on to the end of this 
input. Normally, the next input string would overwrite 
the current entry in yytext. Second, yyless (n) may be 
called to indicate that not all the characters matched by 
the currently successful expression are wanted right now. 
The argument a” indicates the number of characters in 
yytext to be retained. Further characters previously 
matched are returned to the input. This provides the 
same sort of lookahead offered by the / operator, but in a 
different form. 

Example: Consider a language which defines a string as 
a set of characters between quotation (") marks, and pro- 
vides that to include a " in a string it must be preceded by 
a \. The regular expression which matches that is some- 
what confusing, so that it might be preferable to write 


Vole { 
if (yytextlyyleng-1] == ‘\\’) 
yymore(); 
else 
... normal user processing 


which will, when faced with a string such as “adc\"def” 
first match the five characters “adc\; then the call to 
yymore() will cause the next part of the string, “def, to be 
tacked on the end. Note that the final quote terminating 
the string should be picked up in the code labeled ‘‘nor- 
mal processing’’. 

The function yylessQ might be used to reprocess text in 
various circumstances. Consider the C problem of distin- 
guishing the ambiguity of ‘‘=-—a’’. Suppose it is desired 
to treat this as ‘“‘™— a'’ but print a message. A rule 
might be 


=—{a-zA-Z|  { 
printf("Operator (=—) ambiguous\n”); 
yyicss(yyleng-1!); 
... action for =— ... 


which prints a message, returns the letter after the opera- 
tor to the input stream, and treats the operator as ‘“‘= 
Alternatively it might be desired to treat this as “== —a 
To do this, just return the minus sign as well as the letter 
to the input: 


39 
Smad 


39 


=—(u-zA-Z] { 
printf("Operator (=—) ambiguous\n”"); 
yyless(yyleng-2), 
... action for = ... 


will perform the other interpretation. Note that the ex- 
pressions for the two cases might more easily be written 
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a —/(A-Za-z] 
in the first case and 
a /-{[A-Za-z] 


in the second; no backup would be required in the rule 
action. It is not necessary to recognize the whole 
identifier to observe the ambiguity. The possibility of 
‘* =z——3°’ however, makes 


a —/{* \t\n] 
a still better rule. 


In addition to these routines, Lex also permits access to 
the I/O routines it uses. They are: 


1) inputQ which returns the next input character: 
2)  output(c) which writes the character c on the out- 
put; and. 
3)  unput(c) pushes the character c back onto the in- 
put strearn to be read later by inpur(Q). 
By default these routines are provided as macro 


definitions, but the user can override them and suppiy 
private versions. There is another important routine in 
Ratfor, named Jexshf, which is described below under 
‘“‘Character Set’’. These routines define the relationship 
between external files and internal characters, and must 
all be retained or modified consistently. They may be 
redefined, to cause input or output to be transmitted to or 
from strange places, including other programs or internal 
memory; but the character set used must be consistent in 
all routines, a value of zero returned by input must mean 
end of file; and the relationship between unput and input 
must be retained or the Lex lookahead will not work. 
Lex does not look ahead at all if it does not have to, but 
every rule ending in + * ? or $ or containing / implies 
lookahead. Lookahead is also necessary to match an ex- 
pression that is a prefix of another expression. See below 
for a discussion of the character set used by Lex. The 
standard Lex library imposes a 100 character limit on 
backup. 

Another Lex library routine that the user will some- 
times want to redefine is yywrap() which is called when- 
ever Lex reaches an end-of-file. If yywrap returns a |, 
Lex continues with the normal wrapup on end of input. 
Sometimes, however, it is convenient to arrange for more 
input to arrive from a new source. In this case, the user 
should provide a yywrap which arranges for new input 
and returns 0. This instructs Lex to continue processing. 
The default yywrap always returns 1. 

This routine is also a convenient place to print tables, 
summaries, etc. at the end of a program. Note that it is 
not possible to write a normal rule which recognizes end- 
of-file, the only access to this condition is through 
yywrap. In fact, unless a private version of input() is sup- 
plied a file containing nulls cannot be handled, since a 
value of 0 returned by input is taken to be end-of-file. 

In Ratfor all of the standard [/O library routines, input, 


output, unput, yywrap, and lexshf, are defined as integer 
functions. This requires input and yywrap to be called 
with arguments. One dummy argument is supplied and 
ignored. 


5 Ambiguous Source Rules. 


Lex can handle ambiguous specifications. When more 
than one expression can match the current input, Lex 
chooses as follows: 


1) The longest match is preferred. 


2} Among rules which matched the same number of 
characters, the rule given first is preferred. 
Thus, suppose the rules 


keyword action ...; 
identifier action ...: 


integer 
[a-z] + 


to be given in that order. If the input is integers, it is tak- 
en as an identifier, because /a-z/+ matches 8 characters 
while integer matches only 7. If the input is integer, both 
rules match 7 characters, and the keyword rule is selected 
because it was given first. Anything shorter (e.g. int) will 
not match the expression integer and so the identifier in- 
terpretation is used. 

The principle of preferring the longest match makes 
rules containing expressions like .* dangerous. For exam- 
ple, 


might seem a good way of recognizing a string in single 
quotes. But it is an invitation for the program to read far 
ahead, looking for a distant single quote. Presented with 
the input 


‘first’ quoted string here, ‘second’ here 
the above expression will match 
first’ quoted string here, ‘second’ 


which is probably not what was wanted. A better rule is 
of the form 


T\a]e 


which, on the above input, will stop after first’ The 
consequences of errors like this are mitigated by the fact 
that the . operator will not match newline. Thus expres- 
sions like .* stop on the current line. Don’t try to defeat 
this with expressions like [.\a/+ or equivalents; the Lex 
generated program will try to read the entire input file, 
causing internal buffer overflows. 

Note that Lex is normally partitioning the input stream, 
not searching for all possible matches of each expression. 
This means that each character is accounted for once and 
only once. For example, suppose it is desired to count 
occurrences of both ske and sve in an input text. Some 
Lex rules to do this might be 
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She s+; 
he h++; 
| 


\n 


where the last two rules ignore everything besides Ae and 
she. Remember that . does not include newline. Since 
She includes Ae, Lex will normally mot recognize the in- 
stances of he included in she, since once it has passed a 
she those characters are gone. 


Sometimes the user would like to override this choice. 
The action REJECT means ‘‘go do the next alternative.” 
It causes whatever rule was second choice after the 
current rule to be executed. The position of the input 
pointer is adjusted accordingly. Suppose the user really 
wants to count the included instances of Ae 


she {s++; REJECT:} 
he {h++:; REJECT:} 


\n | 


b] 


these rules are one way of changing the previous example 
to do just that. After counting each expression, it is re- 
jected; whenever appropriate, the other expression will 
then be counted. In this exampie, of course, the user 
could note that she includes Ae but not vice versa, and 
omit the REJECT action on Ae, in other cases, however, 
it would not be possible a priori to tell which input char- 
acters were in both classes. 


Consider the two rules 


{...; REJECT:} 
{..., REJECT:] 


albe] + 
alcd] + 


If the input is @5, only the first rule matches, and on ad 
only the second matches. The input string accb matches 
the first rule for four characters and then the second rule 
for three characters. In contrast, the input accd agrees 
with the second rule for four characters and then the first 
rule for three. 


In general, REJECT is useful whenever the purpose of 
Lex is not to partition the input stream but to detect all 
examples of some items in the input, and the instances of 
these items may overiap or include each other. Suppose a 
digram table of the input is desired; normally the digrams 
overlap, that is the word the is considered to contain both 
th and he. Assuming a two-dimensional array named di- 
gram to be incremented, the appropriate source is 


% % 

{a-z]}[a-z]  (digramlyytext(Ol] [yytextlt}]+ +. REJECT} 
\n : 

where the REJECT is necessary to pick up a letter pair 
beginning at every character, rather than at every other 
character. 


6 Lex Source Definitions. 
Remember the format of the Lex source: 


(definitions} 
YUy'Yy 

trules} 

%y% 

{user routines} 


So far only the rules have been described. The user 
needs additional options, though, to define variables for 
use in his program and for use by Lex. These can go ei- 
ther in the definitions section or in the rules section. 
Remember that Lex ts turning the rules into a program. 
Any source not intercepted by Lex is copied into the gen- 
erated program. There are three classes of such things. 


1) Any fine which is not part of a Lex rule or action 
which begins with a blank or tab is copied into the 
Lex generated program. Such source input prior 
to the first '%% delimiter will be external to any 
function in the code; if tt appears immediately 
after the first %%, it appears in an appropriate 
place for declarations in the function written by 
Lex which contains the actions. This material 
must look like program fragments, and should 
precede the first Lex rule. 


As a side effect of the above, lines which begin 
with a blank or tab, and which contain a com- 
ment, are passed through to the generated pro- 
gram. This can be used to include comments in 
either the Lex source or the generated code. The 
comments should follow the host language con- 
vention. 


2) Anything included between lines containing only 
%{ and %} is copied out as above. The delimiters 
are discarded. This format permits entering text 
like preprocessor statements that must begin in 
column 1, or copying lines that do not look like 
programs. 

3) Anything after the third %% delimiter, regardless 


of formats, etc., is copied out after the Lex out- 
pul. 

Definitions intended for Lex are given before the first 
%% delimiter. Any line in this section not contained 
between %{ and %}, and begining in column 1, is as- 
sumed to define Lex substitution strings. The format of 
such lines is 


name translation 


and it causes the string given as a translation to be associ- 
ated with the name. The name and translation must be 
separated by at least one blank or tab, and the name must 
begin with a letter. The translation can then be called out 
by the (name} syntax in a rule. Using {D} for the digits 
and {E} for an exponent field, for example, might abbre- 
viate rules to recognize numbers: 
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D 
E 
%% 
{D}+ printf ("integer"): 
{(D}+"."{D}*(E})? | 
(D}*""(DJ+({E})? | 

{(D}+{E} 


{0-9} 
(TEde] [-+]?{D}+ 


Note the first two rules for real numbers; both require a 
decimal point and contain an optional exponent field, but 
the first requires at least one digit before the decimal 
point and the second requires at least one digit after the 
decimal point. To correctly handle the problem posed by 
a Fortran expression such as 35.EQ./, which does not 
contain a real number, a context-sensitive rule such as 


[0-9] +/"."EQ printf ("integer"): 


could be used in addition to the normal rule for integers. 


The definitions section may also contain other com- 
mands, including the selection of a host language, a char- 
acter set table, a list of start conditions, or adjustments to 
the default size of arrays within Lex itself for larger 
source programs. These possibilities are discussed below 
under “Summary of Source Format,”’ section 12. 


7 Usage. 


There are two steps in compiling a Lex source program. 
First, the Lex source must be turned into a generated 
program in the host general purpose language. Then this 
program must be compiled and loaded, usually with a li- 
brary of Lex subroutines. The generated program is on a 
file named lex.yy.c for a C host language source and 
lex.yy.r for a Ratfor host environment. There are two 
I/O libraries, one for C defined in terms of the C stan- 
dard library [6], and the other defined in terms of Ratfor. 
To indicate that a Lex source file is intended to be used 
with the Ratfor host language, make the first line of the 
file %R. 


The C programs generated by Lex are slightly different 
on OS/370, because the OS compiler is less powerful than 
the UNIX or GCOS compilers, and does less at compile 
time. C programs generated on GCOS and UNIX are the 
same. The C host language is default, but may be expli- 
citly requested by making the first line of the source file 
%C. 


The Ratfor generated by Lex is the same on ail sys- 
tems, but can not be compiled directly on TSO. See 
below for instructions. The Ratfor [/O library, however, 
varies slightly because the different Fortrans disagree on 
the method of indicating end-of-input and the name of 
the library routine for logical AND. The Ratfor I/O li- 
brary, dependent on Fortran character I/O, is quite slow. 
In particular it reads all input lines as 80A1 format, this 
will truncate any longer line, discarding your data, and 
pads any shorter line with blanks. The library version of 
input removes the padding (including any trailing blanks 
from the original input) before processing. Each source 
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file using a Ratfor host should begin with the ‘*%R’”’ com- 
mand. 

UNIX. The libraries are accessed by the loader flags 
-lle for C and -lilr for Ratfor, the C name may be abbrevi- 
ated to -/l. So an appropriate set of commands is 

C Host Ratfor Host 


lex source 
rc -2 lex.yy.r -ilr 


lex source 

cc lex.yy.c -ll -iS 
The resulting program is placed on the usual file a.out for 
later execution. To use Lex with Yacc see below. 
Although the default Lex [/O routines use the C standard 
library, the Lex automata themselves do not do so; if 
private versions of input, output and unput are given, the 
library can be avoided. Note the ‘*-2”" option in the Rat- 
for compile command; this requests the larger version of 
the compiler, a useful precaution. 

GCOS. The Lex commands on GCOS are stored in the 
**’* library. The appropriate command sequences are: 
C Host Ratfor Host 

./lex source 
/rea™ lex.yy.r /lexriib h= 


/lex source 
/cc lex.yy.c-./lexclib h= 


The resulting program is placed on the usual file .program 
for later execution (as indicated by the ‘th=”” option); it 
may be copied to a permanent file if desired. Note the 
‘Sass’* option in the Ratfor compile command; this indi- 
cates that the Fortran compiler is to run in ASCII mode. 

TSO. Lex is just barely available on TSO. Restrictions 
imposed by the compilers which must be used with its 
Output make it rather inconvenient. To use the C ver- 
sion, type 


exec ‘dot.lex.clist(lex)’ 'sourcename’ 
exec ‘dot.lex.clist(cload)’ ‘libraryname membername’ 


The first command analyzes the source file and writes a C 
program on file /ex.yy.text. The second command runs 
this file through the C compiler and links it with the Lex 
C library (stored on “hr289.Ici.load’) placing the object 
program in your file libraryname.LOA D(membername) as 
a completely linked load module. The compiling com- 
mand uses a special version of the C compiler command 
on TSO which provides an unusually large intermediate 
assembier file to compensate for the unusual bulk of C- 
compiled Lex programs on the OS system. Even so, al- 
most any Lex source program is too big to compile, and 
must be split. 

The same Lex command will compile Ratfor Lex pro- 
grams, leaving a file lex.yy.rat instead of Jfex.yy.text in 
your directory. The Ratfor program must be edited, how- 
ever, to compensate for peculiarities of IBM Ratfor. A 
command sequence to do this, and then comotle and 
load, is available. The full commands are: 


exec ‘dot.lex.clist(lex)’ 'sourcename’ 


exec ‘dot.lex.clist(rioad)’ ‘libraryname membername’ 


with the same overall effect as the C language commands. 
However, the Ratfor commands will run in a 150K byte 
partition, while the C commands require 250K bytes to 
operate. 

The steps involved in processing the generated Ratfor 
program are: 

a. Edit the Ratfor program. 
Remove ail tabs. 
Change all lower case letters to upper case letters. 


Convert the file to an 80-column card image file. 
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Process the Ratfor through the Ratfor preproces- 
sor to get Fortran code. 


c. Compile the Fortran. | 


d. Load with the libraries ‘hr289.Iri.load' and 
'sysl.fortlid’. 
The final load module will only read input in 80-character 
fixed length records. Warning: Work is in progress on 
the IBM C compiler, and Lex and its availability on the 
IBM 370 are subject to change without notice. 


8 Lex and Yacc. 


If you want to use Lex with Yacc, note that what Lex 
writes is a program named yylex(), the name required by 
Yace for its analyzer. Normally, the default main pro- 
gram on the Lex library calls this routine, but if Yacc is 
loaded, and its main program is used, Yacc will call 
yylexQ). In this case each Lex rule should end with 


return (token). 


where the appropriate token value is returned. An easy 
way to get access to Yacc’s names for tokens is to compile 
the Lex output file as part of the Yacc output file by plac- 
ing the line 


- 


# include "lex.yy.c” 


in the last section of Yacc input. Supposing the grammar 
to be named ‘‘good’’ and the lexical rules to be named 
‘‘better’’ the UNIX command sequence can just be: 


yacc good 
lex better 
cc y.tab.c -ly -1I -1S 


The Yacc library (-ly) should be loaded before the Lex li- 
brary, to obtain a main program which invokes the Yacc 
parser. The generations of Lex and Yacc programs can be 
done in either order. 


9 Examples. 
As a trivial problem, consider copying an input file 


while adding 3 to every positive number divisible by 7. 
Here is a suitable Lex source program 
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%H% 
int k; 
[0-9]+ { 
scanf(-l, yytext, "Yd", &k); 
if (k%7 == 1) 
printf("%d", k +3); 
else 
printf ("%d",k); 


to do just that. The rule (0-9]+ recognizes strings of di- 
gits; scanf converts the digits to. binary and stores the 
result in kK. The operator % (remainder) is used to check 
whether & is divisible by 7; if it is, it is incremented by 3 
as it is written out. [t may be objected that this program 
will alter such input items as 49.63 or X7. Furthermore, 
it increments the absolute value of all negative numbers 
divisible by 7. To avoid this, just add a few more rules 
after the active one, as here: 


Yy% 
int k; 
-?{0-9] + { 
scanf(-1, yytext, "%d", &k); 
printf("%d", k%7 == 02? k+3: 
-2{0-9.] + ECHO; 
[A-Za-z][A-Za-z0-9]+ ECHO, 


Numerical strings containing a ‘*.’’ or preceded by a letter 
will be picked up by one of the last two rules, and not 
changed. The if-else has been replaced by a C conditional 
expression to save space; the form a?b:c means “‘if a 
then delse c’. 

For an example of statistics gathering, here is a pro- 
gram which histograms the lengths of words, where a 
word is defined as a string of letters. 


int lengs{100}, 
Vy Yo 
(a-zl+  lengslyyleng] + +; 
\n 
Hy % 
yywrap() 


int i; 
printf("Length No. words\n"); 
for(i=O, 1< 100. i+ +) 
if (lengs{i] > 0) 
printf("%5d% 10d\n",i,lengs{i}); 
return(1), 


This program accumulates the histogram, while producing 
no output. At the end of the input it prints the table. 
The final statement return (1); indicates that Lex is to per- 
form wrapup. If yywrap returns zero (false) it implies 
that further input is available and the program is to con- 
tinue reading and processing. To provide a yywrap that 


k); 


never returns true causes an infinite loop. 

As a larger example, here are some parts of a program 
written by N. L. Schryer to convert double precision For- 
tran to single precision Fortran. Because Fortran does 
not distinguish upper and lower case letters, this routine 
begins by defining a set of classes including both : ises of 
each letter: 


a ‘([aAl 
b = [bB] 
c =. [eC] 
2 [eZ] 


An additional class recognizes white space: 
Ww [\tl« 


The first rule changes ‘‘doubie precision” to ‘‘real’’, or 
“DOUBLE PRECISION” to ‘‘REAL”’. 


(dHlol{ul(bhtHel{WHpl{rHelichil{shillol{n} { 
printf(yytext(0] = ='d'? "real": "REAL"). 


Care is taken throughout this program to preserve the 
case (upper or lower) of the original program. The condi- 
tional operator is used to select the proper form of the 
keyword. The next rule copies continuation card indica- 
tions to avoid confusing them with constants: 
“ "f° 01 ECHO, 

In the regular expression, the quotes surround the blanks. 
It is interpreted as ‘‘beginning of line, then five blanks, 
then anything but blank or zero.”’ Note the two different 
meanings of ~. There follow some rules to change double 
precision constants to ordinary floating constants. 


[0-9] +{(WH{a}(WI[+-]2(W}l0-9]+ | 
[0-9J + (W}"."{W]{a}(WI[+-]2{w}lo-91+ | 
“"(WIL0-9] (Wha (WHL+-]2(W]l0-9)+ | 
/* convert constants «/ 

for(p=yytext, ~p != 0: p++) 


if (2p == ‘d'|*p == 'D’) 
spb ie. ‘dd’, 
ECHO, 


After the floating point constant is recognized, it is 
scanned by the for loop to find the letter d or D. The 
program than adds ‘ed’, which converts it to the next 
letter of the alphabet. The modified constant, now 
Single-precision, iS written out again. There follow a 
series of names which must be respeiled to remove their 
initial @ By using the array yytext the same action 
suffices for all the names (only a sample of a rather long 
list is given here). 
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(dl {s}{il{n] | 
(d}{cl{o}{s} | 
(dH (sh{a}{r} {t} | 
(d}fal{t} {a} {n} | 


a}ifh(fol{a}{t}  printf("%s".yytext-+ 1): 


Another list of names must have initial d changed to ini- 
tial a@: 


(di {th {o} {g] | 

(d}(fo}{g}10 |. 

(d}{m}{i}{n}1 | 

(d}{m}{a}(x}1  { 
yytext(0] =+ ‘a’ -'d’: 
ECHO; 
} 


And one routine must have initial ¢d changed to initial r. 


(d)t{m}{al{cl{h} lyytexe{O] = + ‘r’ -'d’, 


To avoid such names as dsinx being detected as instances 
of dsin, some final rules pick up longer words as 
identifiers and copy some surviving characters: 


[A-Za-z][A-Za-z0-9]* | 
[0-9] + | 
\n | 
ECHO, 


Note that this program is not complete; it does not deal 
with the spacing problems in Fortran or with the use of 
keywords as identifiers. 


10 Left Context Sensitivity. 


Sometimes it is desirable to have several sets of lexical 
rules to be applied at different times in the input. For ex- 
ample, a compiler preprocessor might distinguish prepro- 
cessor statements and analyze them differently from ordi- 
nary statements. This requires sensitivity to prior con- 
text, and there are several ways of handling such prob- 
lems. The ~*~ operator, for example, is a prior context 
operator, recognizing immediately preceding left context 
just as J recognizes immediately following right context. 
Adjacent left context could be extended, to produce a fa- 
cility similar to that for adjacent right context, but it is 
unlikely to be as useful, since often the relevant left con- 
text appeared some time earlier, such as at the beginning 
of a line. 

This section describes three means of dealing with 
different environments: a simple use of flags, when only a 
few rules change from one environment to another, the 
use of start conditions on rules, and the possibility of 
making multiple lexical analyzers ail run together. I[n 
each case, there are rules which recognize the need to 
change the environment in which the following input text 


is analyzed, and set some parameter to reflect the change. 
This may be a flag explicitly tested by the user’s action 
code; such a flag is the simplest way of dealing with the 
probiem, since Lex is not involved at all. It may be more 
convenient, nowever, to have Lex remember the flags as 
initial conditions on the rules. Any rule may be associat- 
ed with a start condition, It will only be recognized when 
Lex is in that start condition. The current start condition 
may be changed at any time. Finally, if the sets of rules 
for the different environments are very dissimilar, clarity 
may be best achieved by writing several distinct lexical 
analyzers, and switching from one to another as desired. 

Consider the following problem: copy the input to the 
Output, changing the word magic to first on every line 
which began with the letter a, changing magic to second 
on every line which began with the letter 6, and changing 
magic to third on every line which began with the letter c. 
All other words and all other lines are left unchanged. 

These rules are so simple that the easiest way to J. this 
job is with a flag: 


int flag; 
HM 
"a {flag = ‘a’: ECHO;} 
“b {lag = 'b’, ECHO;} 
*c {flag = ’c’; ECHO;} 
\n (lag = 0; ECHO;} 
magic { 


switch (flag) 

{ 

case ‘a’: printf("first"); break; 
case ‘b’: printf ("second"); break: 
case ‘c’: printf("third”);, break; 
default: ECHO; break; 


should be adequate. 

To handle the same problem with start conditions, each 
start condition must be introduced to Lex in the 
definitions section with a line reading 


%Start namel name2 ... 


where the conditions may be named in any order. The 
word Start may be abbreviated to sor §. The conditions 
may be referenced at the head of a rule with the <> 
brackets: 


<name! >expression 
is a rule which is only recognized when Lex ts tn the start 
condition namel. To enter a start condition, execute the 
action statement 


BEGIN namel: 


which changes the start condition to namel. To resume 
the normal state, 
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BEGIN 0; 


resets the initial condition of the Lex automaton inter- 
preter. A rule may be active in several start conditions: 


<namel,name2,name3 > 
is a legal prefix. Any rule not beginning with the <> 
prefix operator is always active. 


The same example as before can be written: 


%START AA BB CC 


VW % 

“a (ECHO; BEGIN AA;} 
“b {ECHO; BEGIN BB;} 
°c (ECHO: BEGIN CC;} 
\n (ECHO; BEGIN 0;} 
<AA>magic printf (“first”); 

<BB> magic printf (“second”); 

<CC >magic printf (“third”): 


where the logic is exactly the same as in the previous 
method of handling the problem, but Lex does the work 
rather than the user's code. 


1! Character Set. 


The programs generated by Lex handle character I/O 
only through the routines input, output, and unput. Thus 
the character representation provided in these routines is 
accepted by Lex and employed to return values in yytext. 
For internal use a character is represented as a small in- 
teger which, if the standard library is used, has a value 
equal to the integer value of the bit pattern representing 
the character on the host computer. In C, the I/O rou- 
tines are assumed to deal directly in this representation. 
In Ratfor. it is anticipated that many users will prefer 
left-adjusted rather than right-adjusted characters; thus 
the routine lexshf is called to change the representation 
delivered by input into a right-adjusted integer. If the 
user changes the I/O library, the routine fexséf should 
also be changed to a compatible version. The Ratfor li- 
brary I/O system is arranged to represent the letter @ as 
in the Fortran value /Ha while in C the letter a@ is 
represented as the character constant @’ If this interpre- 
tation is changed, by providing I/O routines which 
translate the characters, Lex must be told about it, by giv- 
ing a translation table. This table must be in the 
definitions section, and must be bracketed by lines con- 
taining only ““%T". The table contains lines of the form 


{integer} {character string} 


which indicate the value associated with each character. 
Thus the next example maps the lower and upper case 
letters together into the integers | through 26, newline 
into 27, + and - into 28 and 29, and the digits into 30 
through 39. Note the escape for newline. If a table is 
supplied, every character that is to appear either in the 


WT 
I Aa 
2 Bb 
26 Zz 
27 \n 
28 + 
29 
30 0 
31 I 
39 9 
%T 


Sample character table. 


rules or in any valid input must be included in the table. 
No character may be assigned the number 0, and no char- 
acter may be assigned a bigger number than the size of 
the hardware character set. 


It is not likely that C users will wish to use the charac- 
ter table feature; but for Fortran portability it may be 
essential. 


Although the contents of the Lex Ratfor library rou- 
tines for input and output run almost unmodified on 
UNIX, GCOS, and OS/370, they are not really machine 
independent, and would not work with CDC or Bur- 
roughs Fortran compilers. The user is of course welcome 
to replace input, output, unput and lexshf but to replace 
them by completely portable Fortran routines is likely to 
cause a substantial decrease in the speed of Lex Ratfor 
programs. A simple way to produce portable routines 
would be to leave input and output as routines that read 
with 80Al format, but replace /exshf by a table lookup 
routine. 


12 Summary of Source Format. 
The general form of a Lex source file is: 


{definitions} 

Ye % 

{rules} 

%% 

(user subroutines} 


The definitions section contains a combination of 
1) Definitions, in the form ‘‘name space transia- 
tion”’. 
2) Included code, in the form “space code’. 
3) Included code, in the form 


% { 
code 
% } 


4) 


5) 


6) 


7) 
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Start conditions, given in the form 
%S name! name? ... 
Character set tables, in the form 


%T 
number space character-string 


%T 
A language specifier,'which must also precede any 


rules or included code, in the form ‘*%C’’ for C 
or ‘‘%R’* for Ratfor. 


Changes to internal array sizes, in the form 
%x nan 


where aan is a decimal integer representing an ar- 
ray size and x selects the parameter as follows: 


Letter Parameter 
p positions 
n states 
 e tree nodes 
a transitions 
kK packed character classes 
oO output array size 


Lines in the rules section have the form ‘‘expression ac- 
tion’’ where the action may be continued on succeeding 
lines by using braces to delimit It. 

Regular expressions in Lex use the following operators: 


the character "x" 

an "x", even if x is an operator. 
an "x", even if x is an operator. 
the character x or y. 

the characters x, y or z. 

any character but x. 

any character but newline. 

an x at the beginning of a line. 
an x when Lex is in start condition y. 
an x at the end of a line. 

an optional x. 

0,1,2, ... instances of x. 

1,2,3, ... Instances of x. 
anxoray. 

an xX. 

an x but only if followed by y. 


the translation of xx from the definitions section. 


m through a occurrences of x 


13 Caveats and Bugs. 


There are pathological expressions which produce ex- 
ponential growth of the tables when converted to deter- 
ministic machines: fortunately, they are rare. 

REJECT does not rescan the input: instead it 
remembers the results of the previous scan. This means 
that if a rule with trailing context is found, and REJECT 
executed, the user must not have used uwaput to change 
the characters forthcoming from the input stream. This ts 
the only restriction on the user's ability to manipulate the 
not-yet-processed input. 

TSO Lex is an older version. Among the non- 
supported features are REJECT, start conditions, or vari- 
able length trailing context, And any significant Lex 
source is too big for the IBM C compiler when translated. 
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ABSTRACT 


Although Fortran is not a pleasant language to use, it does have the advantages of universality and 


(usually) relative efficiency. The Ratfor language attempts to conceal the main deficiencies of Fortran 
while retaining its desirable qualities, by providing decent control flow statements: 


statement grouping 

if-else and switch for decision-making 
while, for, do, and repeat-until for looping 
break and next for controiling loop exits 


and some “‘syntactic sugar”: 


e 


free form input (multiple statements/line, automatic continuation) 
unobirusive comment convenuon 

translation of >, >=, etc., into .GT., .GE., etc. 
return(expression) siatement for functions 

define statement for symbolic parameters 

include statement for including source files 


Ratfor is implemented as a preprocessor which translates this language into Fortran. 


Once the control flow and cosmetic deficiencies of Fortran are hidden, the resulting language is 


remarkably pleasant to use. Ratfor programs are markedly easier to wrile, and to read, and thus easier to 
debug, maintain and modify than their Fortran equivalents. 


It is readily possible to write Ratfor programs which are portable to other env ironments. Ratfor is 


written in itself in this way, so it is also portable, versions of Ratfor are now running on at least a dozen 
different types of computers at over one hundred locations. 


This paper discusses design criteria for a Fortran preprocessor, the Ratfor language and its imple- 


mentation, and user experience. 
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RATFOR — A Preprocessor for a Rational Fortran 


Brian W. Kernighan 


Bell Laboratories 
Murray Hill, New Jersey 07974 


1. INTRODUCTION 


Most programmers will agree that Fortran 
is an unpleasant language to program in, yet 
there are many occasions when they are forced 
lo use it. For example, Fortran is often the only 
language thoroughly supported on the local com- 
puter. Indeed, it is the closest thing to a univer- 
sal programming language currently available: 
with care it is possible to write large, truly port- 
able Fortran programs([!]. Finally, Fortran is 
often the most ‘“‘efficient’’ language available, 
particularly for programs requiring much compu- 
tation. 


But Fortran is unpleasant. Perhaps the 
worst deficiency is in the control flow statements 
— conditional branches and loops — which 
express the logic of the program. The condi- 
tional statements in Fortran are primitive. The 
Arithmetic iF forces the user into at least two 
statement numbers and two (implied) GoTO’s; it 
leads to unintelligible code, and is eschewed by 
good programmers. The Logical IF is better, in 
that the test part can be stated clearly, but hope- 
lessly restrictive because the statement that fol- 
lows the IF can only be one Fortran statement 
(with some /urther restrictions!). And of course 
there can be no ELSE part to a Fortran iF: there is 
no way to specify an alternative action if the IF is 
not satisfied. 


The Fortran 00 restricts the user to going 
forward in an arithmetic progression. I[t is fine 
for ‘*! to N in steps of | (or 2 or ...)"*, but there 
is no direct way to go backwards, or even (in 
ANSI Fortran(2]) to go from 1 to N—1. And of 
course the DO is useless if one’s problem doesn’t 
map into an arithmetic progression. 


The result of these failings is that Fortran 
programs must be written with numerous labels 
and branches. The resulting code is particularly 
difficult to read and understand, and thus hard to 
debug and modify. 


When one is faced with an unpleasant 
language, a useful technique is to define a new 
language that overcomes the deficiencies, and to 
translate it into the unpleasant one with a 
preprocessor. This is the approach taken with 


Ratfor. (The preprocessor idea is of course not 
new, and preprocessors for Fortran are especially 
popular today. A recent listing [3] of preproces- 
sors shows more than 50, of which at least half a 
dozen are widely avaiiable. ) 


2. LANGUAGE DESCRIPTION 


Design 


Ratfor attempts to retain the merits of 
Fortran (universality, portability, efficiency) 
while hiding the worst Fortran inadequacies. 
The language is Fortran except for two aspects. 
First, since control flow is central to any pro- 
gram, regardiess of the specific application, the 
primary task of Ratfor is to conceal this part of 
Fortran from the user, by providing decent con- 
trol flow structures. These structures are 
sufficient and comfortable for structured pro- 
gramming in the narrow sense of programming 
without GOTO's. Second, since the preprocessor 
must examine an entire program to translate the 
control structure, it is possible at the same time 
to clean up many of the “‘cosmetic”’ deficiencies 
of Fortran, and thus provide a language which is 
easier and more pleasant to read and write. 


Beyond these two aspects — control flow 
and cosmetics — Ratfor does nothing about the 
host of other weaknesses of Fortran. Aithough 
it would be straightforward to extend it to pro- 
vide character strings, for example, they are not 
needed by everyone, and of course the prepro- 
cessor would be harder to implement. 
Throughout, the design principle which has 
determined what should be in Ratfor and what 
should not has been Raitfor doesn't know any For- 
tran. Any language feature which would require 
that Ratfor really understand Fortran has been 
omitted. We will return to this point in the sec- 
tion on implementation. 


Even within the confines of control flow 
and cosmetics, we have attempted to be selective 
in what features to provide. The intent has been 
to provide a small set of the most useful con- 
structs, rather than to throw in everything that 
has ever been thought useful by someone. 


The rest of this section contains an infor- 
mal description of the Ratfor language. The con- 
trol flow aspects will be quite familiar to readers 
used to languages like Algol, PL/I, Pascal, etc., 
and the cosmetic changes are equally straightfor- 
ward. We shall concentrate on showing what the 
language looks like. 


Statement Grouping 


Fortran provides no way to group state- 
ments together, short of making them into a 
subroutine. The standard construction ‘‘if a con- 
dition is true, do this group of things,’’ for 
example, 


if (x > 100) 


{ call error("x > 100"): err = 1; return | 


cannot be written directly in Fortran. Instead a 
programmer is forced to translate this relatively 
clear thought into murky Fortran, by stating the 
negative condition and branching around the 
group of statements: 


if (x le. 100) goto 10 
call error(Shx > 100) 
err = | 
return 
10 


When the program doesn't work, or when it 
must be modified, this must be translated back 
into a clearer form before one can be sure what 
it does. 


Ratfor eliminates (this error-prone and 
confusing back-and-forth translation. the first 
form is the way the computation is written in 
Ratfor. A group of statements can be treated as 
a umit by enclosing them in the braces | and }. 
This is true throughout the language: wherever a 
single Ratfor statement can be used, there can be 
several enclosed in braces. (Braces seem clearer 
and less obtrusive than begin and end or do and 
end, and of course do and end already have For- 
(ran meanings.) 


Cosmetics contribute to the readability of 
code, and thus to its understandability. The 
character ““>*" is clearer than “GT.", so Ratfor 
translates it appropriately, along with several 
other similar shorthands. Although many For- 
tran compilers permit character strings in quotes 
(like “x >100"), quotes are not allowed in ANS! 
Fortran, so Ratfor converts it into the right 
number of H's: computers count better than 
people do. 


Ratfor is a free-form language: statements 
may appear anywhere on a line, and several may 
appear on one line if they are separated by semi- 


colons. The example above could also be written 
as 


uw (x > 100) { 
call error("x > 100") 
err =. | 
return 


In this case, no semicolon is needed at the end 
of each line because Ratfor assumes there is one 
Statement per line unless told otherwise. 


Of course, if the statement that follows the 
if is a single statement (Ratfor or otherwise), no 
braces are needed: 


iffy <=00&z <= 0.0) 
write(6, 20) y, z 


No continuation need be indicated because the 
statement is clearly not finished on the first line. 
In general Ratfor continues lines when it seems 
obvious that they are not yet done. (The con- 
tinuation convention is discussed in detail later.) 


Although a free-form language permits 
wide latitude in formatting styles, it is wise to 
pick one that is readabie. then stick to it. In par- 
ticular, proper indentation is vital, to make the 
logical structure of the program obvious to the 
reader. 


The ‘‘else’’ Clause 


Ratfor provides an eise statement to han- 
die the construction ‘‘if a condition is true, do 
this thing, otherwise do that thing.” 


if (a <™ b) 
( sw = 0: write(6, 1) a, b | 
else 
{sw = |: write(6, 1) b, a | 
This writes out the smaller of a and b. then the 
larger, and sets sw appropriately. 
The Fortran equivalent of this code is cir- 
Cuitous indeed: 


if (a .gt. b) goto 10 


sw = 0 
write(6. 1) a. b 
goto 20 
10 sw = | 
write(6, 1) b, a 
20 


This is a mechanical translation, shorter forms 
exist, as they do for many similar situations. But 
all translations suffer from the same problem: 
since they are translations, they are less clear and 
understandable than code that ts not a transla- 


uon. To understand the Fortran version, one 
must scan the entire program to make sure that 
no other statement branches to statements 10 or 
20 before one knows that indeed this is an if- 
else consiruction. With the Ratfor version, there 
is no question about how one gets to the parts of 
the statement. The if-else is a single unit, which 
can be read, understood, and ignored if not 
relevant. The program says what it means. 


As before, if the statement following an if 
or an else is a single statement, no braces are 
needed: 


if (a <= b) 
sw = 0 
else 
sw = | 


The syntax of the if statement is 


if (legal Fortran conaition) 
Ratfor statement 
else 
Ratfor statement 


where the else part is optional. The /ega/ Fortran 
condition is anything that can legally go into a 
Fortran Logical if. Ratfor does not check this 
clause, since it does not know enough Fortran to 
know what is permitted. The Rarfor statement is 
any Ratfor or Fortran statement, or any collec- 
tion of them in braces. 


Nested ifs 


Since the statement that follows an if or an 
else can be any Rarfor statement, this leads 
immediately to the possibility of another if or 
else. As a useful example, consider this problem: 
the variable f is to be set to —1 if x is less than 
zero, to +1 if x is greater than 100, and to 0 
otherwise. Then in Ratfor, we write 


if (x < 0) 

f= —| 
else if (x > 100) 

f= +] 
else 

f=0 


Here the statement after the first else is another 
if-else. Logically it is just a single statement, 
although it ts rather complicated. 


This code says what it means. Any ver- 
sion written in straight Fortran will necessarily be 
indirect because Fortran does not let you say 
what you mean. And as always, clever shortcuts 
may turn out to be too clever to understand a 
year from now. 


Foilowing an else with an if is one way to 
write a multieway tranch in Ratfor. In generai 
the structure 


if (...) 


else if (...) 


om ete 


else if (...) 


else 


provides a way to specify the choice of exactly 
one of several alternatives. (Ratfor also provides 
a switch statement which does the same job in 
certain special cases; in more general situations, 
we have to make do with spare parts.) The tests 
are laid out in sequence, and each one is fol- 
lowed by the code associated with it. Read down 
the list of decisions until one is found that is 
satisfied. The code associated with this condition 
is executed, and then the entire structure is 
finished. The trailing else part handles the 
“default” case, where none of the other condi- 
tions apply. If there is no default action, this 
final else part is omitted: 


if (x < 0) 
x = 0 
else if (x > 100) 
x = 100 


if-else ambiguity 


There is one thing to notice about compli- 
cated structures involving nested if's and else's. 
Consider 


if (x > 0) 
if (y > 0) 
write(6, 1) x, y 
else 


write(6, 2) y 


There are two ifs and only one else. Which if 
does the else go with? 


This is a genuine ambiguity in Ratfor, as it 
is in many other programming languages. The 
ambiguity is resolved in Ratfor (as elsewhere) by 
Saying that in such cases the else goes with the 
closest previous un-else'ed if. Thus in this case, 
the else goes with the inner if, as we have indi- 
cated by the indentation. 

It is a wise practice to resolve such cases 


by explicit braces, just to make your intent clear. 
In the case above, we would write 


if (x > 0) { 
if (y > 0) 
write(6, 1) x, y 
else 
write(6, 2) y 
: 
which does not change the meaning, but leaves 
no doubt in the reader's mind. If we want the 
other association, we must write > 


if (x > 0) | 
if (y > 0) 
write(6, 1) x, y 
else 
write(6, 2) y 


The ‘“‘switch’’ Statement 


The switch statement provides a clean way 
to express multi-way branches which branch on 
the value of some integer-valued expression. 
The syntax is 


switch (expression) | 


case expri: 
statements 

case expr2. expr3: 
Statements 

default: 
statements 


Each case is followed by a list of comma- 
separated integer expressions. The expression 
inside switch is compared against the case 
expressions expri, expr2. and so on in turn until 
one matches, at which ume the statements fol- 
lowing that case are executed. If no cases match 
expression, and there is a default section, the 
Statements with it are done: if there is no 
defauit, nothing is done. In all situations, as 
soon as some biock of statements is executed, 
the entire switch is exited immediately. 
(Readers familiar with C(4] should beware that 
this behavior is not the same as the C switch.) 


The ‘*do’’ Statement 


The do statement in Ratfor is quite similar 
to the DO statement in Fortran, except that it 
uses no statement number. The statement 
number, after all, serves only to mark the end of 
the DO, and this can be done just as easily with 
braces. Thus 


doi™=i,n| 
x(i) = 0.0 
y(i) = 0.0 
z(i) = 0.0 


is the same as 


do lQi=I,n 
x(i) = 0.0 
y(i) = 0.0 
z{i) = 0.0 


10 continue 


The syntax is: 


do legal-Fortran-DO-text 
Ratfor statement 


The part that follows the keyword do has to be 
something that can legally go into a Fortran Do 
statement. Thus if a local version of Fortran 
allows DO limits to be expressions (which is not 
currently permitted in ANSI Fortran), they can be 
used in a Ratfor do. 


The Rat/or siatement part will often be 
enclosed in braces, but as with the if, a single 
Statement need not have braces around it. This 
code sets an array to zero: 


doiwti,n 
x(i) = 0.0 


Slightly more complicated, 


doi tl.n 
dojmiwna 
mi, j) = 0 


sets the entire array m to zero, and 


doi tiwn 
doj™ml,n 

if Gi < 5) 
mi, j) = -l 

else if (i == j) 
mi, j) = 0 

else 
mi, j) = +1 


sets the upper triangle of m to —1, the diagonal 
to zero, and the lower triangle to +1. (The 
Operator == is ‘‘equals’', that is, “.EQ."*.) In 
each case, the statement that follows the do is 
logically a single statement, even though compli- 
cated, and thus needs no braces. 


**break’’ and ‘‘next”’ 


Ratfor provides a statement for leaving a 
loop early, and one for beginning the next itera- 
tion. break causes an immediate exit from the 


do; in effect it is a branch to the statement after 
the do. next is a branch to the bottom of the 
loop, so it causes the next iteration to be done. 
For example, this code skips over negative 
values in an array: 


doi=tl,n{ 
if (x(i) < 0.0) 
next 
process positivé element 


break and next also work in the other Ratfor 
looping constructions that we will talk about in 
the next few sections. 


break and next can be followed by an 
integer (Oo indicate breaking or iterating that level 
of enclosing loop: thus 


break 2 


exits from two levels of enclosing loops, and 
break | is equivalent to break. next 2 iterates 
the second enclosing loop. (Realistically, muiti- 
level break'’s and next's are not likely to be 
much used because they lead to code that is hard 
to understand and somewhat risky to change.) 


The “‘while’’ Statement 


One of the problems with the Fortran Do 
Statement is that it generally insists upon being 
done once, regardless of its limits. if a loop 
begins 


DOI= 2,1 


this will typically be done once with I set to 2, 
even though common sense would suggest that 
perhaps it shouldn't be. Of course a Ratfor do 
can easily be preceded by a test 


if G <= k) 
doij,k { 


| ae 


but this has to be a conscious act, and is often 
overlooked by programmers. 


A more serious problem with the DO state- 
ment is that it encourages that a program be 
written in terms of an arithmetic progression 
with small positive steps, even though that may 
not be the best way to write it. If code has to be 
contorted to fit the requirements imposed by the 
Fortran DO, tt is that much harder to write and 
understand. 


To overcome these difficulties, Ratfor pro- 
vides a while statement, which is simply a loop: 
““while some condition is true, repeat this group 


of statements’’. it has no preconceptions about 
why one is looping. For example, this routine to 
compute sin(x) by the Maciaurin series combines 
(wo termination criteria. 


real function sin(x, e) 
# returns sin(x) to accuracy e, by 
# sin(x) = x — x003/3! + x0eS/S! — .., 


sin ™ Xx 
term ™ x 


j= 3 

while (abs(term) >e & i< 100) | 
term = —term © x**2 / float(ie(i—1)) 
sin = sin + term 
i™=i + 2 


return 
end 


Notice that if the routine is entered with 
term already smaller than e, the loop will be 
done zero times, that is, no attempt will be made 
to compute x**3 and thus a potential underflow 
is avoided. Since the test is made at the top of a 
while loop instead of the bottom, a special case 
disappears — the code works at one of its boun- 
daries. (The test i< 100 is the other boundary — 
making sure the routine stops after some max- 
imum number of iterations.) 


As an aside, a sharp character “#"" in a 
line marks the beginning of a comment, the rest 
of the line is comment. Comments and code can 
co-exist on the same line — one can make mar- 
ginal remarks, which is not possible with 
Fortran’s ‘°C in column 1°” convention. Bildnk 
lines are also permitted anywhere (they are not 
in Fortran): they should be used to emphasize 
the natural divisions of a program. 


The syntax of the while statement is 


while (/ega/ Fortran condition) 
Ratfor statement 


As with the if, /ega/ Forrran condition is some- 
thing that can go into a Fortran Logical iF, and 
Ratfor statement is a single statement, which may 
be multiple statements in braces. 


The while encourages a style of coding not 
normally practiced by Fortran programmers. For 
example, suppose nextch is a function which 
returns the next input character both as a func- 
tion value and in its argument. Then a loop to 
find the first non-blank character is just 


while (nextch(ich) = = iblank) 


% 


A semicolon by itself is a null statement, which 
is necessary here to mark the end of the while, 
if it were not present, the while would control 
the next statement. When the loop is broken, 
ich contains the first non-blank. Of course the 
same code can be written in Fortran as 


100 = if (nextch(ich) .eq. iblank) goto 100 


but many Fortran programmers (and a few com- 
pilers) believe this line is illegal. The language at 
one's disposal strongly influences how one thinks 
about a problem. 


The ‘‘for’’ Statement 


The for statement is another Ratfor loop, 
which attempts to carry the separation of loop- 
body from reason-for-looping a step further than 
the while. A for statement allows explicit initiali- 
zation and increment steps as part of the state- 
ment. For example, a DO loop is just 


for(i m@ lii<m nimi + 1)... 
This is equivalent to 


i= | 
while (i <= n) | 


i™mi+ 1] 
} 
4 
The initialization and increment of | have been 


moved into the for statement, making it easiet to 
see at a glance what controls the loop. 


The for and while versions have the 
advantage that they will be done zero times if n 
is less than |: this is not true of the do. 


The loop of the sine routine in the previ- 
Ous section can be re-written with a for as 


for (i=3: abs{term) > e & i < 100: imi+2) | 
term ™= —term « x*2 / float(ie(i—1)) 
sin ™ sin + term 


The syntax of the for statement is 


for ( init. condition . increment ) 
Ratfor statement 


mit is any single Fortran statement, which gets 
done once before the loop begins. «srcrement is 
any single Fortran statement, which gets done at 
the end of each pass through the loop, before 
the test. condition is again anything that 1s legal 
in a logical tf. Any of init, condition, and incre- 
ment may be omitted, although the semicolons 


must always be present. A non-existent condition 
is treated as aiways true, so for(;;) is an 
inde® te repeat. (But see the repeat-until in 
the next section.) 


The for statement is particularly useful for 
backward loops, chaining along lists, loops that 
might be done zero times, and similar things 
which are hard to express with a DO statement, 
and obscure to write out with IF’s and GOTO’s. 
For example, here is a backwards DO loop to find 
the last non-blank character on a card: 


for (i = 80:1 > 0:5 = i — 1) 
if (card(i) '= blank) 
break 


(‘‘tm’* is the same as ~.NE."). The code scans 
the columns from 80 through to |. If a non- 
blank is found, the loop is immediately broken. 
(break and next work in for’s and while’s just as 
in do's). If i reaches zero, the card is all blank. 


This code is rather nasty to write with a 
regular Fortran DO, since the loop must go for- 
ward, and we must explicitly set up proper condi- 
tions when we fall out of the loop. (Forgetting 
this is a common error.) Thus: 


DO 10J = 1, 80 
i= 81 -J 
IF (CARD(I) .NE. BLANK) GO TO 11 
10 CONTINUE 
[= 0 
tl 


The version that uses the for handles the termi- 
nation condition properly for free: i is zero when 
we fall out of the for loop. 


The increment in a for need not be an 
arithmetic progression, the following program 
waiks along a list (stored in an integer array ptr) 
until a zero pointer is found, adding up elements 
from a parallel array of values: 


sum = 0.0 
for (i = first; i > 0: i = ptr{i)) 
sum = sum + value(i) 


Notice that the code works correctly if the list is 
empty. Again, placing the test at the top of a 
loop instead of the bottom eliminates a potential 
boundary error. 


The ‘‘repeat-until’’ statement 


In spite of the dire warnings, there are 
times when one really needs a loop that tests at 
the bottom after one pass through. This service 
is provided by the repeat-until: 


repeat 
Ratfor statement 
until (/egal Fortran condition) 


The Ratfor statement part is done once, then the 
condition is evaiuated. I[f it is true, the loop is 
exited; if it is false, another pass is made. 


The until part is optional, so a bare repeat 
is the cleanest way to specify an infinite loop. Of 
course such a loop must ultimately be broken by 
some transfer of control such as stop, return, or 
break, or an imolicit stop such as running out of 
input with a READ statement. 


AS a matter of observed fact({8], the 
repeat-until statement is much less used than the 
other looping constructions: in particular, it is 
typically outnumbered ten to one by for and 
while. Be cautious about using it, for loops that 
test only at the bottom often don't handle null 
cases well. 


More on break and next 


break exits immediately from. do, while, 
for, and repeat-until. next goes to the test part 
of do, while and repeat-until, and to the incre- 
ment step of a for. 


“return’’ Statement 


The standard Fortran mechanism for 
returning a value from a function uses the name 
of the function as a variable which can be 
assigned to; the last value stored in it is the 


function value upon return. For example, here. 


is a routine equal which returns | if two arrays 
are identical, and zero if they differ. The array 
ends are marked by the special value —1. 


# equal — compare str! to str2: 

# ~=return | if equal, 0 if not 
integer function equal(sirl. str2) 
integer stri (100), str2(100) 
integer i 


for (i = 1. sert{ (i) == str2(i), i = i + 1) 
if (strl(i) == —1) { 


equai = | 
return 
equal = 0 
return 
end 


In many languages (e.g., PL/I) one instead 
says 


return (expression) 


to return a value from a function. Since this ts 


often clearer, Ratfor provides such a return 
statement — in a function F, return(expression) 
is equivalent to 


{ F = expression; return | 
For example, here ts equal again: 


# equal — compare str! to str2; 

# ~=s return | if equal, 0 if not 
integer function equail(strl, str2) 
integer str! (100), str2(100) 
integer i 


for (i = 1, stri (i) == str2(i). i = i + 1) 
if (stri(i) == —1) 
return(1) 
return (0) 
end 


If there is no parenthesized expression after 
return, a normal RETURN is made. (Another 
version of equal is presented shortly.) 


Cosmetics 


As we said above, the visual appearance of 
a language has a substantial effect on how easy it 
is to read and understand programs. Accord- 
ingly, Ratfor provides a number of cosmetic 
facilities which may be used to make programs 
more readable. 


~ 


Free-form Input 


Statements can be placed anywhere on a 
line, long statements are continued automat- 
cally, as are long conditions in If, while, for, and 
until. Blank lines are ignored. Multiple state- 
ments may appear on one line, if they are 
separated by semicolons. No semicolon is 
needed at the end of a line, if Ratfor can make 
some reasonable guess about whether the state- 
ment ends there. Lines ending with any of the 
characters 


G & 2 


are assumed to be continued on the next line. 
Underscores are discarded wherever they occur: 
all others remain as part of the statement. 


Any statement that begins with an all- 
numeric field is assumed to be a Fortran label, 
and placed in columns [-5 upon output. Thus 


write(6, 100); 100 format(*hello”) 


t+ - +, | 


is converted into 


write{6, 100) 


100 format(Shhello) 


Transiation Services 


Text enclosed in matching single or double 
quotes is converted to nH... but is otherwise 
unaitered (except for formatting — it may get 
split across catd boundaries during the reformat- 
ting process). Within quoted strings, the 
backslash ‘\" serves as an escape character: the 
next character is taken literally. This provides a 
way to get quotes (and of course the backslash 
itself) into quoted strings: 


"WV" 


is a string containing a backslash and an apos- 
trophe. (This is aor the standard convention of 
doubled quotes, but it is easier to use and more 
general.) 


Any line that begins with the character °%' 
is left absolutely unaltered except for stripping 
off the “%’ and moving the line one position to 
the left. This is useful for inserting control 
cards, and other things that should not be 
transmogrified (like an existing Fortran pro- 
gram). Use “%* only for ordinary statements, 
not for the condition parts of if, while, etc., or 
the output may come oul in an unexpected place. 


The following character translations are 
made, except within single or double quotes or 
on a line beginning with a “h’. 


= = eq. ad ne. 
> gt. >= ge. 
< At <= le. 
& .and. | .or. 

! not. - not. 


In addition, the following translations are pro- 
vided for input devices with restricted character 
sets. 


$( 


**define’’ Statement 


Any string of alphanumeric characters can 
be defined as a name: thereafter, whenever that 
Mame occurs in the input (delimited by non- 
alphanumerics) it is replaced by the rest of the 
definition line. (Comments and trailing white 
Spaces are stripped off). A defined name can be 
arbitrarily long, and must begin with a letter. 


define is typically used to create symbolic 
parameters: 


define ROWS 100 
define COLS 50 


dimension a(ROWS)}, b(ROWS, COLS) 
if (i > ROWS | j > COLS) ... 
Alternately, definitions may be written as 
define(ROWS, 100) 


In this case, the defining text is everything after 
the comma up to the balancing right parenthesis: 
this allows multi-line definitions. 


It is generaily a wise practice to use sym- 
bolic parameters for most constants, to help 
make clear the function of what would otherwise 
be mysterious numbers. As an example, here is 
the routine equal again, this time with symbolic 
constants. 


define YES L.. 
define NO 0 
define EOS | 
define ARB 100 


# equal — compare str! to sir2; 

# ~= return YES if equal, NO if not 
integer function equal(sir!, str2) 
integer stri(ARB), str2(ARB) 
integer | 


for (i = Ly sert (i) == str2(i). i = i + 1) 
if (strt (i) == EOS) 
return(YES) 
return(NO) 
end 


‘“*include’’ Statement 
The statement 
include file 


inserts the file found on input stream //e into the 
Ratfor input in place of the include statement. 
The standard usage is to place COMMON blocks 
on a file, and include that file whenever a copy is 
needed: 


subroutine x 
include commonblocks 
end 
suroutine y 
include commonbiocks 
end 
This ensures that all copies of the COMMON 
blocks are identical 


Pitfalls, Botches, Blemishes and other Failings 


Ratfor catches certain syntax errors, such 
as missing braces, else clauses without an if, and 
most errors invoiving missing parentheses in 
Statements. Beyond that, since Ratfor knows no 
Fortran, any errors you make will be reported by 
the Fortran compiler. so you will from time to 
time have to relate a Fortran diagnostic back to 
the Ratfor source. 


Keywords are reserved — using If, else, 
etc., as variable names will typically wreak havoc. 
Don't leave spaces in keywords. Don’t use the 
Arithmetic IF. 


The Fortran nH convention is not recog- 
nized anywhere by Ratfor: use quotes instead. 


3. IMPLEMENTATION 


Ratfor was originally written in C{4] on the 
UNIX operating system(5]. The language is 
specified by a context free grammar and the 
compiler constructed using the YACC compiler- 
compiler [6]. 


The Ratfor grammar is simple and straight- 
forward, being essentially 


prog : stat 
| prog stat 
Stats: if (...) stat 


| if (...) stat else stat 

| while (...) stat 

| for (...3 00.2...) Stat 

| do ... stat 

| repeat stat 

| repeat stat until (...) 

| switch (...) | case ...: prog ... 
defauit: prog | 

| return 

| break 

| next 

| digits stat 

| | prog | 

| anything unrecognizable 


_ The observation that Ratfor knows no Fortran 
follows directly from the rule that says a state- 
ment is “anything unrecognizable’*. In fact most 
of Fortran falls into this category, since any 
Statement that does not begin with one of the 
keywords is by definition “unrecognizable. * 


Code generation is also simple. If the first 
thing on a source line is not a keyword (like If, 
else, etc.) the entire statement is simply copied 
to the output with appropriate character transia- 
tion and formatting. (Leading digits are treated 
as a label.) Keywords cause only slightly more 
complicated actions. For example. when If is 


recognized, two consecutive labels L and L+1 
are generated and the value of L is stacked. The 
condition is then isolated, and the code 


if (.not. (condition)) goto L 


is output. The statement part of the if is then 
translated. When the end of the statement is 
encountered (which may be some distance away 
and include nested if's, of course), the code 


L continue 


is generated, unless there is an else clause, in 
which case the code is 


goto L+1| 
L continue 


In this latter case, the code 
L+1 


is produced after the statement part of the else. 
Code generation for the various loops is equally 
simple. - 


continue 


One might argue that more care should be 
taken in code generation. For example, if there 
is no trailing else, 


ifdi>0)x =a 
should be left alone, not converted into 


if (not. (i .gt. 0)) goto 100 
Xx ™4 


100 continue 


But what are optimizing compilers for, if not to 
improve code? [t is a rare program indeed where 
this kind of ‘‘inefficiency’’ will make even a 
measurable difference. In the few cases where it 
is important, the offending lines can be protected 
by ‘%’. 

The use of a compiler-compiler is 
definitely the preferred method of software 
development. The language is well-defined, with 
few syntactic irregularities. Implementation is 
quite simple. the original construction took 
under a week. The language is sufficiently sim- 
pie, however, that an ad hoc recognizer can be 
readily constructed to do the same job if no 
compiler-compiler is available. 


The C version of Ratfor is used on UNIX 
and on the Honeywell GCOS systems. C com- 
pilers are not as widely available as Fortran, 
however, so there is also a Ratfor written in 
itself and originally bootstrapped with the C ver- 
sion. The Ratfor version was written so as to 
translate into the portable subset of Fortran 
described in [1], so it is portable, having been 
run essentially without change on at least tweive 


distinct machines. (The main restrictions of the 
portable subset are: only one character per 
machine word: subscripts in the form cevtc; 
avoiding expressions in places like DO loops; con- 
sistency in subroutine argument usage, and in 
COMMON declarations. Ratfor itself will not gra- 
tuitously generate non-standard Fortran.) 


The Ratfor version is about 1500 lines of 
Ratfor (compared to about 1000 lines of C): this 
compiles into 2500 lines of Fortran. This expan- 
sion ratio is somewhat higher than average, since 
the compiled code contains unnecessary 
occurrences of COMMON declarations. The exe- 
cution time of the Ratfor version is dominated 
by two routines that read and write cards. 
Clearly these routines could be replaced by 
machine coded local versions: unless this is 
done, the efficiency of other parts of the transla- 
tion process is largely irrelevant. 


4. EXPERIENCE 


Good Things 


“It's so much better than Fortran” is the 
most common response of users when asked 
how well Ratfor meets their needs. Aithough 
cynics might consider this to be vacuous, it does 
seem to be true that decent control flow and 
cosmetics converts Fortran from a bad language 
into quile a reasonable one, assuming that For- 
tran data structures are adequate for the task at 
hand. 


Although there are no quantitative results, 
users fee! that coding in Ratfor is at least twice 
as fast as in Fortran. More important, debugging 
and subsequent revision are much faster than in 
Fortran. Partly this is simply because the code 
can be read. The looping statements which test 
at the top instead of the bottom seem to elim- 
inate or at least reduce the occurrence of a wide 
class of boundary errors. And of course it is 
easy to do structured programming in Ratfor; 
this self-discipline also contributes markedly to 
reliability. 


One interesting and encouraging fact is 
that programs written in Ratfor tend to be as 
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readable as programs written in more modern 
languages like Pascal. Once one is freed from 
the suackles of Fortran’s clerical detail and rigid 
input format, it is easy to write code that is read- 
able, even esthetically pleasing. For example, 
here is a Ratfor implementation of the linear 
table search discussed by Knuth (7): 


A(m+1) = x 
for (Gi = 1; Afi) (= xsi ei + 1) 


if (i Sa) { 
m = j 
B(i) = | 
else 


B(i) = Bli) + | 


A large corpus (5400 lines) of Ratfor, including 
a subset of the Ratfor preprocessor itself, can be 
found in [8]. 


Bad Things 


The biggest single problem is that many 
Fortran syntax errors are mot detected by Ratfor 
bur by the local Fortran compiler. The compiler 
then prints a message in terms of the generated 
Fortran, and in a few cases this may be difficult 
to relate back to the offending Ratfor line, espe- 
cially if the impiementation conceals the gen- 
erated Fortran. This problem could be dealt with 
by tagging each generated line with some indica- 
tion of the source line that created it, but this is 
inherently implementation-dependent, so no 
action has yet been taken. Error message 
interpretation is actually not so arduous as might 
be thought. Since Ratfor generates no variables, 
only a simple pattern of iF’s and GOTO's, data- 
related errors like missing DIMENSION statements 
are easy to find in the Fortran. Furthermore, 
there has been a steady improvement in Ratfor’s 
ability to catch trivial syntactic errors like unbal- 
anced parentheses and quotes. 


There are a number of implementation 
weaknesses that are a nuisance, especially to new 
users. For example. keywords are reserved. 
This rarely makes any difference, except for 
those hardy souls who want to use an Arithmetic 
IF. A few standard Fortran constructions are not 
accepted by Ratfor, and this is perceived as a 
problem by users with a large corpus of existing 
Fortran programs. Protecting every line with a 
*%° is not really a complete solution, although it 
serves aS a stop-gap. The best long-term solu- 
tion is provided by the program Struct [9], which 
converts arbitrary Fortran programs into Ratfor. 


Users who export programs often complain 


that the generated Fortran is ‘unreadable’ 
because it is not tastefully formatted and con- 
tains extraneous CONTINUE Statements. To some 
extent this can be ameliorated (Ratfor now has 
an option to copy Ratfor comments into the gen-- 
erated Fortran), but it has always seemed that 
effort is better spent on the input language than 
on the output esthetics. 


One final problem is partly attributable to 
success — since Ratfor is relatively easy to 
modify, there are now several dialects of Ratfor. 
Fortunately. so far mosi of the differences are in 
character set, or in invisible aspects like code 
generation. 


5. CONCLUSIONS 


Ratfor demonstrates that with modest 
effort it is possible to convert Fortran from a bad 
language into quite a good one. A preprocessor 
is clearly a useful way to extend or ameliorate 
the facilities of a base language. 


When designing a language, it is important 
io concentrate on the essential requirement of 
providing the user with the best language possi- 
ble for a given effort. One must avoid throwing 
in ‘features’ — things which the user may trivi- 
ally construct within the existing framework. 


One must also avoid getting sidetracked on 
irrelevancies. For instance it seems pointless for 
Ratfor to prepure a neatly formatted listing of 
either iS input or its output. The user is 
presumably capable of the self-discipline required 
to prepure neat input that reflects his thoughts. 
{t is much more important that the language pro- 
vide free-form input so he can format it neatly. 
No one should read the output anyway except in 
the most dire circumstances. 
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Appendix: Usage on UNIX and GCOS. 
Beware — local customs vary. Check with a native before going into the jungle. 


UNIX 


The program ratfor is the basic translator, it takes either a list of file names or the standard input 
and writes Fortran on the standard output. Options include —6x. which uses x as a continuation charac- 
ter in column 6 (UNIX uses & in column |), and —C, which causes Ratfor comments to be copied into 
the generated Fortran. 


The program re provides ar interface to the ratfer command which is much the same as ce. Thus 


re [options] files 


compiles the files specified by files. Files with names ending in .r are Ratfor source: other files are 
assumed to be for the louder. The flags —C and —6x described above are recognized, as are 


—c compile only: dont load 

—f save intermediate Fortran /f files 

—F Ratfor only: implies -—c and -f 

—2 use big Fortran compiler (for large programs) 

-U flag undeclared variables (not universally available) 


Other flugs are passed on to the loader. 


GCOS : 


The program ./ratfor is the bare translator, and is identical to the UNIX version, except that the 
continuation convention is & in column 6. Thus 


/ratfor files >output 
translates the Ratfor source on files and collects the generated Fortran on file ‘output’ for subsequent 
processing. 


re provides much the same services as re (within the limitations of GCOS), regrettably with a 
somewhat different syntax. Options recognized by ./re include 


name Ratfor source or library, depending on type 
h=/name make TSS He file (runnable version), run as /name 
r=/name update and use random library 

d= compile as ascii (default is bed) 

C= copy comments into Fortran 

fname Fortran source file 

g=name gmap source file 


Other options are as specitied for the ./ec command described in [4]. 


TSO, TSS, and other systems 
Rutfor exists on various other systems: check with the author for specifics. 
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ABSTRACT 


M4 is a macro processor available on UNIX and GCOS. Its primary use 
has been as a front end for Ratfor for those cases where parameterless macros 
are not adequately powerful. It has also been used for languages as disparate as 
C and Cobol. M4 is particularily suited for functional languages like Fortran, 
PL/I and C since macros are specified in a functional notation. 


M4 provides features seldom found even in much larger macro proces- 
sors, including 


® arguments 

condition testing 

arithmetic capabilities 

string and substring functions 
file manipulation 


This paper is a user’s manual for M4. 
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Introduction 


A macro processor is a useful way to 
enhance a programming language, to make 
it more palatable or more readable, or to 
tailor it to a particular application. The 
#define statement in C and the analogous 
define in Ratfor are examples of the basic 
facility provided by any macro processor — 
replacement of text by other text. 


The M4 macro processor is an exten- 
sion of a macro processor called M3 which 
was written by D. M. Ritchie for the AP-3 
minicomputer; M3 was in turn based on a 
macro processor implemented ffor [I]. 
Readers unfamiliar with the basic ideas of 
macro processing may wish to read some of 
the discussion there. 


M4 is a suitable front end for Ratfor 
and C, and has also been used successfully 
with Cobol. Besides the straightforward 
replacement of one string of text by 
another, it provides macros with arguments, 
conditional macro expansion, arithmetic, file 
manipulation, and some specialized string 
processing functions. , 


The basic operation of M4 is to copy 
its input to its output. As the input is read, 
however, each alphanumeric ‘‘token’’ (that 
is, string of letters and digits) is checked. If 
it is the name of a macro, then the name of 
the macro is replaced by its defining text, 
and the resulting string is pushed back onto 
the input to be rescanned. Macros may be 
called with arguments, in which case the 
arguments are collected and substituted into 
the right places in the defining text before it 
is rescanned. 


M4 provides a collection of about 
twenty built-in macros which perform vari- 
ous useful operations; in addition, the user 
can define new macros. Built-ins and user- 
defined macros work exactly the same way, 


except that some of the built-in macros have 
side effects on the state of the process. 


Usage 
On UNIX, use 
m4 [files] 


Each argument file is processed in order; if 
there are no arguments, or if an argument is 
‘—°, the standard input is read at that point. 
The processed text is written on the stan- 
dard output, which may be captured for sub- 
sequent processing with 


m4 [files] >outputfile 


On GCOS, usage is identical, but the pro- 
gram is called ./m4. 


Defining Macros 


The primary built-in function of M4 is 
define, which is used to define new macros. 
The input 


define({name, stuff) 


causes the string name to be defined as 
stuff. All subsequent occurrences of name 
will be replaced by stuff. name must be 
alphanumeric and must begin with a letter 
(the underscore _ counts as a letter). stuff 
is any text that contains’ balanced 
parentheses; it may stretch over multiple 
lines. 


Thus, as a typical example, 
define(N, 100) 


if (i > N) 
defines N to be 100, and uses this ‘“‘sym- 
bolic constant’’ in a later if statement. 


The left parenthesis must immediately 
follow the word define, to signal that define 
has arguments. If a macro or built-in name 


is not followed immediately by ‘(, it is 
assumed to have no arguments. This is the 
situation for N above; it is actually a macro 
with no arguments, and thus when it is used 
there need be no (...) following it. 


You should also notice that a macro 
name is only recognized as such if it appears 
surrounded by non-alphanumerics. For 
example, in 


define(N, 100) 


if (NNN > 100) 


the variable NNN its absolutely unrelated to 
the defined macro N, even though it con- 
tains a lot of N’s. 


Things may be defined in terms of 
other things. For example, 


define(N, 100) 
define(M, N) 


defines both M and N to be 100. 


What happens if N is redefined? Or, 
to say it another way, is M defined as N or 
as 100? In M4, the latter is true — M is 
100, so even if N subsequently changes, M 
does not. 


This behavior arises because M4 
expands macro names into their defining 
text as soon as it possibly can. Here, that 
means that when the string N is seen as the 
arguments of define are being collected, it is 
immediately replaced by 100; it’s just as if 
you had said 


define(M, 100) 


in the first place. 


If this isn’t what you really want, there 
are two ways out of it. The first, which is 
specific to this situation, is to interchange 
the order of the definitions: 


define(M, N) 
define(N, 100) 


Now M is defined to be the string N, so 
when you ask for M later, you'll always get 
the value of N at that time (because the M 
will be replaced by N which will be replaced 
by 100). 


Quoting 


The more general solution is to delay 
the expansion of the arguments of define by 
quoting them. Any text surrounded by the 
Silgle quotes and ° is not expanded 
immediately, but has the quotes stripped off. 
If you say 


define(N, 100) 
define(M, ‘N’) 


the quotes around the N are stripped off as 
the argument is being collected, but they 
have served their purpose, and M is defined 
as the string N, not 100. The general rule ts 
that M4 always strips off one level of single 
quotes whenever it evaluates something. 
This is true even outside of macros. If you 
want the word define to appear in the out- 
put, you have to quote it in the input, as in 


‘define = 1; 


As another instance of the same thing. 
which is a bit more surprising, consider 
redefining N: 


define(N, 100) 


define(N, 290) 


Perhaps regrettably, the N in the second 
definition is evaluated as soon as it’s seen: 
that is, it is replaced by 100, so it’s as if you 
had written 


define(100, 200) 


This statement is ignored by M4, since you 
can only define things that look like names, 
but it obviously doesn’t have the effect you 
wanted. To really redefine N, you must 
delay the evaluation by quoting: 


define(N, 100) 
define('N’, 200) 
In M4, it is often wise to quote the first 


argument of a macro. 


If ‘and ° are not convenient for some 
reason, the quote characters can be changed 
with the built-in changequote: 


changequote({, ]) 


makes the new quote characters the left and 
right brackets. You can restore the original 
characters with just 


changequote 


There are two additional built-ins 
related to define. undefine removes the 
definition of some macro or built-in: 


undefine(‘N’) 


removes the definition of N. (Why are the 
quotes absolutely necessary?) Built-ins can 
be removed with undefine, as in 


undefine(‘define’) 


but once you remove one, you can never 
get it back. 


The built-in ifdef provides a way to 
determine if a macro is currently defined. 
In particular, M4 has pre-defined the names 
unix and gcos on the corresponding sys- 
tems, so you can tell which one you’re 
using: 

ifdef(C unix’, ‘define(wordsize,16)’ ) 
ifdef(gcos’, ‘define{wordsize,36)’ ) 


makes a definition appropriate for the partic- 
ular machine. Don’t forget the quotes! 


ifdef actually permits three arguments; 
if the name is undefined, the value of ifdef 
is then the third argument, as in 


ifdef unix’, on UNIX, not on UNIX) 


Arguments 


So far we have discussed the simplest 
form of macro processing — replacing one 
string by another (fixed) string. User- 
defined macros may also have arguments, so 
different invocations can have different 
results. Within the replacement text for a 
macro (the second argument of its define) 
any occurrence of $n will be replaced by the 
nth argument when the macro is actually 
used. Thus, the macro bump, defined as 


define(bump, $1 = $1 + 1) 


generates code to increment its argument by 
Le: 


bump(x) 


Xx=x+i 


A macro can have as many arguments 
aS you want, but only the first nine are 
accessible, through $1 to $9. (The macro 


name itself is $0, although that is less com- 
monly used.) Arguments that are not sup- 
plied are replaced by null strings, so we can 
define a macro cat which simply concaten- 
ates its arguments, like this: 


define(cat, $1$2$3$4$5$6$7$8$9) 
Thus 
cat(x, y, z) 
is equivalent to 
xyz 
$4 through $9 are null, since no correspond- 


ing arguments were provided. 


Leading unquoted blanks, tabs, or 
newlines that occur during argument collec- 
tion are discarded. All other white space is 
retained. Thus 


b c) 


defines ato be b ec. 


Arguments are separated by commas, 
but parentheses are counted properly, so a 
comma ‘‘protected’’ by parentheses does not 
terminate an argument. That is, in 


define(a, (b,c)) 


there are only two arguments; the second is 
literally (b,c). And of course a bare comma 
or parenthesis can be inserted by quoting it. 


define(a, 


Arithmetic Built-ins 


M4 provides two built-in functions for 
doing arithmetic on integers (only). The 
simplest is iner, which increments its 
numeric argument by |. Thus to handle the 
common programming situation where you 
want a variable to be defined as ‘tone more 
than N’’, write 


define(N, 100) 
define(N1, ‘incr(N)’) 


Then N1 is defined as one more than the 
current value of N. 


The more general mechanism for 
arithmetic is a built-in called eval, which ts 
capable of arbitrary arithmetic on integers. 
It provides the operators (in decreasing 
order of precedence) 


unary + and — 

** or ~ (exponentiation) 
+ / % (modulus) 

+. 


i= <€< <= > >= 
! (not) 

&or&& (logical and) 

| or Il (logical or) 


Parentheses may be used to group opera- 
tions where needed. All the operands of an 
expression given to eval must ultimately be 
numeric. The numeric value of a true rela- 
tion (like 1>0) is 1, and false is 0. The 
precision in eval is 32 bits on UNIX and 36 
bits on GCOS. 


Asa simple example, suppose we want 
M to be 2**N+1. Then 


define{N, 3) 
define(M, ‘eval(2**N+1)’) 


As a matter of principle, it is advisable to 
quote the defining text for a macro unless it 
is very simple indeed (say just a number); it 
usually gives the result you want, and is a 
good habit to get into. 


File Manipulation 


You can include a new file in the input 
at any time by the built-in function include: 


include(filename) 


inserts the contents of filename in place of 
the include command. The contents of the 
file is often a set of definitions. The value 
of include (that is, its replacement text) is 
the contents of the file; this can be captured 
in definitions, etc. 


It is a fatal error if the file named in 
include cannot be accessed. To get some 
control over this situation, the alternate 
form sinclude can be used; sinclude 
(‘‘silent include’’) says nothing and contin- 
ues if it can’t access the file. 


It is also possible to divert the output 
of M4 to temporary files during processing, 
and output the collected material upon com- 
mand. M4 maintains nine of these diver- 
sions, numbered | through 9. If you say 


divert (n) 


all subsequent output is put onto the end of 
a temporary file referred to as n. Diverting 
to this file is stopped by another divert com- 


mand; in particular, divert or divert(0) 
resumes the normal output process. 


Diverted text is normally output ail at 
once at the end of processing, with the 
diversions output in numeric order. I[t is 
possible, however, to bring back diversions 
at any time, that is, to append them to the 
current diversion. 


undivert 


brings back all diversions in numeric order, 
and undivert with arguments brings back 
the selected diversions in the order given. 
The act of undiverting discards the diverted 
stuff, as does diverting into a diversion 
whose number is not between 0 and 9 
inclusive. 


The value of undivert is sot the 
diverted stuff. Furthermore, the diverted 
material is not rescanned for macros. 


The built-in divnum returns the 
number of the currently active diversion. 
This is zero during normal processing. 


System Command 


You can run any program in the loca! 
Operating system with the sysemd built-in. 
For example, 


syscmd (date) 


on UNIX runs the date command. Normaily 
sysemd would be used to create a file for a 
subsequent include. 


To facilitate making unique file names, 
the built-in maketemp is provided, with 
specifications identical to the system func- 
tion mktemp: a string of XXXXX in the 
argument is replaced by the process id of the 
current process. 


Conditionals 


There is a built-in called ifelse which 
enables you to perform arbitrary conditional 
testing. In the simplest form, 


ifelse(a, b, c, d) 


compares the two strings a and b. If these 
are identical, ifelse returns the string c: oth- 
erwise it returns d. Thus we might define a 
macro called compare which compares two 
strings and returns ‘‘yes’’ or ‘‘no’’ if they 
are the same or different. 


define(compare, ‘ifelse($1, $2, yes, no)’) 


Note the quotes, which prevent too-early 
evaluation of ifelse. 


If the fourth argument is missing, it is 
treated as empty. 


ifelse can actually have any number of 
arguments, and thus provides a limited form 
of multi-way decision capability. In the 
input 


ifelse(a, b, ¢, d, e, f, g) 


if the string a matches the string b, the 
result is c. Otherwise, if d is the same as e, 
the result is f. Otherwise the result is g. If 
the final argument is omitted, the result is 
null, so 


ifelse(a, b, c) 


is c if a matches b, and null otherwise. 


String Manipulation 


- The built-in len returns the length of 
the string that makes up its argument. Thus 


len (abcdef) 


is 6, and len{(a,b)) is 5S. 


The built-in substr can be used to pro- 
duce substrings of strings. substr(s, i, n) 
returns the substring of s that starts at the 
ith position (origin zero), and is n charac- 
ters long. If n is omitted, the rest of the 
String is returned, so 


substr(‘now is the time’, 1) 


ow is the time 


If i or nm are out of range, various sensible 
things happen. 


index(si, s2) returns the index (posi- 
tion) in sl where the string s2 occurs, or 
—1 if it doesn’t occur. As with substr, the 
origin for strings is 0. 


The built-in translit performs charac- 
ter transliteration. 
translit(s, f, t) 


modifies s by replacing any character found 
in f by the corresponding character of t. 
That 1s, 


translit(s, aeiou, 12345) 


replaces the vowels by the corresponding 
digits. If t is shorter than f, characters 
which don’t have an entry in t are deleted: 
as a limiting case, if t is not present at all, 
characters from f are deleted from s. So 


translit(s, aeiou) 


deletes vowels from s. 


There is also a built-in called dnl 
which deletes all characters that follow it up 
to and including the next newline; it is use- 
ful mainly for throwing away empty lines 
that otherwise tend to clutter up M4 output. 
For example, if you say 


define(N, 100) 
define(M, 200) 
define(L, 300) 


the newline at the end of each line is not 
part of the definition, so it is copied into the 
output, where it may not be wanted. If you 
add dnl to each of these lines, the newlines 
will disappear. 


Another way to achieve this, due to J. 
E. Weythman, is 


divert(—1) 
define(...) 


= 


divert 


Printing 

The built-in errprint writes its argu- 
ments out on the standard error file. Thus 
you can say 


errprint(‘fatal error’) 


dumpdef is a debugging aid which 
dumps the current definitions of defined 
terms. If there are no arguments, you get 
everything; otherwise you get the ones you 
name as arguments. Don’t forget to quote 
the names! 


Summary of Built-ins 


Each entry is preceded by the page 
number where it is described. 


changequote(L, R) 
define(name, replacement) 
divert(number) 

divnum 

dal 

dumpdefCname’, ‘name’, ...) 
errprint(s, s, ...) 
eval(numeric expression) 
ifdef( name’, this if true, this if false) 
ifelse(a, b, c, d)’. 

include (file) 

incr(number) 

index(sl, s2) 

len (string) 
maketemp(...XXXXX...) 
sinclude (file) 

substr(string, position, number) 
syscmd(s) 

translit(str, from, to) 
undefine name’) 
undivert(number,number,...) 
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Introduction 


It is common practice to divide large programs into smaller, more manageable pieces. 
The pieces may require quite different treatments: some may need to be run through a macro 
processor, some may need to be processed by a sophisticated program generator {(e.g., Yacc{1] 
or Lex{2]). The outputs of these generators may then have to be compiled with special options 
and with certain definitions and declarations. The code resulting from these transformations 
may then need to be loaded together with certain libraries under the control of special options. 
Related maintenance activities involve running complicated test scripts and installing validated 
modules. Unfortunately, it is very easy for a programmer to forget which files depend on 
which others, which files have been modified recently, and the exact sequence of operations 
needed to make or exercise a new version of the program. After a long editing session, one 
may easily lose track of which files have been changed and which object modules are still valid, 
since a change to a declaration can obsolete a dozen other files. Forgetting to compile a routine 
that has been changed or that uses changed declarations will result in a program that will not 
work, and a bug that can be very hard to track down. On the other hand, recompiling every- 
thing in sight just to be safe is very wasteful. 


The program described in this report mechanizes many of the activities of program 
development and maintenance. If the information on inter-file dependences and command 
sequences is stored in a file, the simple command 


make 


is frequently sufficient to update the interesting files, regardless of the number that have been 
edited since the last ‘‘make’’. In most cases, the description file is easy to write and changes 
infrequently. It is usually easier to type the make command than to issue even one of the 
needed operations, so the typical cycle of program development operations becomes 


think — edit — make — test . 


Make is most useful for medium-sized programming projects; it does not solve the prob- 
lems of maintaining multiple source versions or of describing huge programs. Make was 
designed for use on Unix, but a version runs on GCOS. 


Basic Features 


4 
The basic operation of make is to update a target file by ensuring that all of the files on 
which it depends exist and are up to date, then creating the target if it has not been modified 
since its dependents were. Make does a depth-first search of the graph of dependences. The 
operation of the command depends on the ability to find the date and time that a file was last 
modified. 


To illustrate, let us consider a simple example: A program named prog is made by compul- 
ing and loading three C-language files x.c, y.c, and z.c with the /S library. By convention, the 
output of the C compilations will be found in files named x.o, y.o, and z.o. Assume that the 
files x.c and y.c share some declarations in a file named defs, but that z.c does not. That is, x.c 
and y.c have the line 


#include “defs” 
The following text describes the relationships and operations: 


prog: x.0 y.O 2.0 
cc x.0 y.o z.0 IS —o prog 


x.o y.o: defs 
If this information were stored in a file named makefile, the command 
make 


would perform the operations needed to recreate prog after any changes had been made to any 
of the four source files x.c, y.c, z.c, or des. 


Make operates using three sources of information: a user-supplied description file (as 
above), file names and ‘‘last-modified’’ times from the file system, and built-in rules to bridge 
some of the gaps. In our example, the first line says that prog depends on three ‘‘.o’’ files. 
Once these object files are current, the second line describes how to load them to create prog. 
The third line says that x.o and y.o depend on the file defs. From the file system, make discov- 
ers that there are three ‘‘.c’’ files corresponding to the needed ‘‘.o’’ files, and uses built-in 
information on how to generate an object from a source file (/.e., issue a ‘‘cc —c’’ command). 


The following long-winded description file is equivalent to the one above, but takes no 
advantage of make’s innate knowledge: 
prog: x.0 y.O Z.0 
cc x.0 y.o zo ~IS —o prog 
x.0: x.c defs 


cc “Cc X.c 
y.o: y.c defs 

ce. =e: Ve 
2.0: Z.C 

ce “CC Z.cC 


If none of the source or object files had changed since the last time prog was made, all of 
the files would be current, and the command 


make 


would just announce this fact and stop. If, however, the defS file had been edited, x.c and y.c 
(but not z.c) would be recompiled, and then prog would be created from the new ‘‘.o’’ files. If 
only the file y.c had changed, only it would be recompiled, but it would still be necessary to 
reload prog. 


If no target name is given on the make command line, the first target mentioned in the 
description is created; otherwise the specified targets are made. The command 


make x.o 


would recompile x.o if x«.c or defs had changed. 


If the file exists after the commands are executed, its time of last modification is used in 
further decisions; otherwise the current time is used. It is often quite useful to include rules 
with mnemonic names and commands that do not actually produce a file with that name. 
These entries can take advantage of make’s ability to generate files and substitute macros. 
Thus, an entry ‘‘save’’ might be included to copy a certain set of files, or an entry ‘‘cleanup”’ 
might be used to throw away unneeded intermediate files. In other cases one may maintain a 
zero-length file purely to keep track of the time at which certain actions were performed. This 
technique is useful for maintaining remote archives and listings. 


Make has a simple macro mechanism for substituting in dependency lines and command _ 


as 


strings. Macros are defined by command arguments or description file lines with embedded 
equal signs. A macro is invoked by preceding the name by a dollar sign; macro names longer 
than one character must be parenthesized. The name of the macro is either the single character 
after the dollar sign or a name inside parentheses. The following are valid macro invocations: 


$(CFLAGS) 


The last two invocations are identical. $$ is a dollar sign. All of these macros are assigned 
values during input, as shown below. Four special macros change values during the execution 
of the command: $+, $@, $?, and $<. They will be discussed later. The following fragment 
shows the use: 


OBJECTS = x.o y.o Z.o 
LIBES = -—1S 
prog: $(OBJECTS) 
cc S(OBJECTS) S(LIBES) —o prog 


The command 
make 

loads the three object files with the /S library. The command 
make "LIBES= —Il —Ip" 


loads them with both the Lex (‘‘—II’’) and the portable (‘‘—Ip’’) libraries, since macro 
definitions on the command line override definitions in the description. (It is necessary to 
quote arguments with embedded blanks in Unix commands.) : 


The following sections detail the form of description files and the command line, and dis- 
cuss options and built-in rules in more detail. 


Description Files and Substitutions 


A description file contains three types of information: macro definitions, dependency 
information, and executable commands. There is also a comment convention: ail characters 
after a sharp (#) are ignored, as in the sharp itself. Blank lines and lines beginning with a 
sharp are totally ignored. If a non-comment line is too long, it can be continued using a 
backslash. If the last character of a line is a backslash, the backslash, newline, and following 
blanks and tabs are replaced by a single blank. 


A macro definition is a line containing an equal sign not preceded by a colon or a tab. 
The name (string of letters and digits) to the left of the equal sign (trailing blanks and tabs are 
stripped) is assigned the string of characters following the equal sign (leading blanks and tabs 
are stripped.) The following are valid macro definitions: 


2 ™ xyz 
abc = —ll —ly —Ip 
LIBES = 


The last definition assigns LIBES the null string. A macro that is never explicitly defined has 
the null string as value. Macro definitions may also appear on the make command line (see 
below). 


Other lines give information about target files. The general form of an entry is: 


a 


target! [target2 .. .] :[:] [dependentl . . .] [; commands] [#.. .] 
[ (tab) commands] [#.. .] 


Items inside brackets may be omitted. Targets and “-vendents are strings of letters, digits, 
periods, and slashes. (Sheil metacharacters ‘‘*’’ and ‘‘?”* are expanded.) A command is any 
string of characters not including a sharp (except in quotes) or newline. Commands may 
appear either after a semicolon on a dependency line or on lines beginning with a tab immedi- 


ately following a dependency line. 


A dependency line may have either a single or a double colon. A target name may appear 
on more than one dependency line, but all of those lines must be of the same (single or double 
colon) type. 


1. For the usual single-colon case, at most one of these dependency lines may have a com- 
mand sequence associated with it. If the target is out of date with any of the dependents 
on any of the lines, and a command sequence is specified (even a null one following a 
semicolon or tab), it is executed; otherwise a default creation rule may be invoked. 


2. In the doubie-colon case, a command sequence may be associated with each dependency 
line; if the target is out of date with any of the files on a particular line, the associated 
commands are executed. A built-in rule may also be executed. This detailed form is of 
particular value in updating archive-type files. 


if a target must be created, the sequence of commands is executed. Normally, each com- 
mand line is printed and then passed to a separate invocation of the Shell after substituting for 
macros. (The printing is suppressed in silent mode or if the command line begins with an @ 
sign). Make normally stops if any command signals an error by returning a non-zero error 
code. (Errors are ignored if the ‘‘—i’’ flags has been specified on the make command line, if 
the fake target name ‘“‘.IGNORE” appears in the description file, or if the command string in 
the description file begins with a hyphen. Some Unix commands return meaningless status). 
Because each command line is passed to a separate invocation of the Shell, care must be taken 
with certain commands (e.g., chdir and Shell control commands) that have meaning only within 
a single Shell process; the results are forgotten before the next line is executed. 


Before issuing any command, certain macros are set. $@ is set to the name of the file to 
be ‘‘made’’. $? is set to the string of names that were found to be younger than the target. If 
the command was generated by an implicit rule (see below), $< is the name of the related file 
that caused the action, and $= is the prefix shared by the current and the dependent file names. 

If a file must be made but there are no explicit commands or relevant built-in rules, the 
commands associated with the name ‘““. DEFAULT” are used. If there is no such name, make 
prints a message and stops. 


Command Usage 
The make command takes four kinds of arguments: macro definitions, flags, description 
file names, and target file names. 
make [ flags] [ macro definitions ] [ targets ] 
The following summary of the operation of the command explains how these arguments are 
interpreted. 


First, all macro definition arguments (arguments with embedded equal signs) are analyzed 
and the assignments made. Command-line macros override corresponding definitions found in 
the description files. 


Next, the flag arguments are examined. The permissible flags are 


5. 


Ignore error codes returned by invoked commands. This mode is entered if the fake tar- 
get name ‘‘“.IGNORE”’ appears in the description file. 


Silent mode. Do not print command lines before executing. This mode is also entered if 
the fake target name ‘“‘.SILENT”’ appears in the description file. 

Do not use the built-in rules. . 

No execute mode. Print commands, but do not execute them. Even lines beginning with 
an ‘‘@”* sign are printed. 

Touch the target files (causing them to be up to date) rather than issue the usual com- 
mands. 

Question. The make command returns a zero or non-zero status code depending on 
whether the target file is or is not up to date. 

Print out the complete set of macro definitions and target descriptions 

Debug mode. Print out detailed information on files and times examined. 


Description file name. The next argument is assumed to be the name of a description 
file. A file name of ‘‘—”’’ denotes the standard input. If there are no ‘“‘-—f” arguments, 
the file named makefile or Makefile in the current directory is read. The contents of the 
description files override the built-in rules if they are present). 


Finally, the remaining arguments are assumed to be the names of targets to be made; they 


are done in left to right order. If there are no such arguments, the first name in the description 
files that does not begin with a period is ‘‘made’’. 


Implicit Rules 


The make program uses a table of interesting suffixes and a set of transformation rules to 


supply default dependency information and implied commands. (The Appendix describes these 
tables and means of overriding them.) The default suffix list is: 


.0 Object file 

Cc C source file 

2 Efl source file 

w Ratfor source file 

S Fortran source file 

5 Assembler source file 

J Yacc-C source grammar 

ye Yacc-Ratfor source grammar 
ye Yacc-Efl source grammar 

l Lex source grammar 


The following diagram summarizes the default transformation paths. If there are two paths 
connecting a pair of suffixes, the longer one is used only if the intermediate file exists or is 
named in the description. 


Lc w .e J .Ss .y .yr .ye .l .d 


yl yr eye ; 


If the file x.o were needed and there were an x.c in the description or directory, tt would 


be compiled. If there were also an x./, that grammar would be run through Lex before 


hu 


compiling the result. However, if there were no x.c but there were an x./, make would discard 
the intermediate C-language file and use the direct link in the graph above. 


It is possible to change the names of some of the compilers used in the default, or the flag 
arguments with which they are invoked by knowing the macro names used. The compiler 
names are the macros AS, CC, RC, EC, YACC, YACCR, YACCE, and LEX. The command 


make CC =newcc 


will cause the ‘‘newcc’’ command to be used instead of the usual C compiler. The macros 
CFLAGS, RFLAGS, EFLAGS, YFLAGS, and LFLAGS may be set to cause these commands 
to be issued with optional flags.’ Thus, : 


make "CFLAGS = —0" 


causes the optimizing C compiler to be used. 


Example 


As an example of the use of make, we will present the description file used to maintain 
the make command itself. The code for make is spread over a number of C source files and a 
Yacc grammar. The description file contains: 


# Description file for the Make command 


P = und —3]opr —r2 # send to GCOS to be printed 
FILES = Makefile version.c defs main.c doname.c misc.c files.c dosys.cgram.y lex.c gcos.c 
OBJECTS = version.o main.o doname.o misc.o files.o dosys.o gram.o 


LIBES= —IS 
LINT = lint —p 
CFLAGS = -0O 


make: $(OBJECTS) 
cc S(CFLAGS) S(OBJECTS) $(LIBES) —o make 
size make 


$(OBJECTS): defs 
gram.o: lex.c 


cleanup: 
—rm *.o gram.c 
—du 


install: 
@size make /usr/bin/make 
cp make /usr/bin/make ; rm make 


print: S(FILES)  # print recently changed files 
pr $?|$P 
touch print 


test: 
make —dp|grep ~v TIME >Izap 
/usr/bin/make —dp| grep —v TIME >2zap 
diff lzap 2zap 
rm |zap 2zap 


lint: dosys.c doname.c files.c main.c mise.c version.c gram.c 
$(LINT) dosys.c doname.c files.c main.c misc.c version.c gram.c 
rm gram.c 


arch: 
ar uv /sys/source/s2/make.a $(FILES) 


Make usually prints out each command before issuing it. The following output results from 


typing the simple command 
make 
in a directory containing only the source and description file: 


cc —c version.c 

cc -c main.c 

cc —c doname.c 

cc -c misc.c 

cc —c files.c 

cc —c dosys.c 

yacc gram.y 

mv y.tab.c gram.c 

cc —c gram.c 

cc version.o main.o doname.o misc.o files.o dosys.o gram.o —IS —o make 
13188 +3348 +3044 = 19580b = 046174b 


Although none of the source files or grammars were mentioned by name in the description file, 
make found them using its suffix rules and issued the needed commands. The string of digits 
results from the ‘‘size make’? command; the printing of the command line itself was suppressed 
by an @ sign. The @ sign on the size command in the description file suppressed the printing 
of the command, so only the sizes are written. 


The last few entries in the description file are useful maintenance sequences. The ‘‘print’’ 
entry prints only the files that have been changed since the last ‘‘make print’> command. A 
zero-length file print is maintained to keep track of the time of the printing; the $? macro in the 
command line then picks up only the names of the files changed since print was touched. The 
printed output can be sent to a different printer or to a file by changing the definition of the P 
macro: 


make print "P = opr —sp” 
or 
make print "P= cat >zap" 


Suggestions and Warnings 


The most common difficulties arise from make’s specific meaning of dependency. If file 
x.c has a ‘““#include "“defs" °* line, then the object file x.o depends on defs; the source file x.c 
does not. (If defs is changed, it is not necessary to do anything to the file .v.c, while it is neces- 
Sary to recreate x.o.) 


To discover what make would do, the ‘‘—n’”’ option is very useful. The command 
make —n 


orders make to print out the commands it would issue without actually taking the time to exe- 
cute them. If a change to a file is absolutely certain to be benign (e.g., adding a new definition 
to an include file), the ‘‘—t’’ (touch) option can save a lot of time: instead of issuing a large 
number of superfluous recompilations, make updates the modification times on the affected file. 
Thus, the command 


make ~—ts 


(‘touch silently’) causes the relevant files to appear up to date. Obvious care is necessary, 
since this mode of operation subverts the intention of make and destroys all memory of the 
previous relationships. 

The debugging flag (‘‘—d’’) causes make to print out a very detailed description of what it 
is doing, including the file times. The output is verbose, and recommended only as a last 
resort. 
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Appendix. Suffixes and Transformation Rules 


The make program itself does not know what file name suffixes are interesting or how to 
transform a file with one suffix into a file with another suffix. This information is stored in an 
internal table that has the form of a description file. If the ‘‘-—r’’ flag is used, this table is not 
used. 


The list of suffixes is actually the dependency list for the name ‘.SUFFIXES’’: make 
looks for a file with any of the suffixes on the list. If such a file exists, and if there is a 
transformation rule for that combination, make acts as described earlier. The transformation 
rule names are the concatenation of the two suffixes. The name of the rule to transform a ‘‘.r’’ 
file to a ‘‘.o’’ file is thus ‘‘.r.o’. If the rule is present and no explicit command sequence has 
been given in the user’s description files, the command sequence for the rule ‘*‘.r.o’’ is used. If 
a command is generated by using one of these suffixing rules, the macro S= is given the value 
of the stem (everything but the suffix) of the name of the file to be made, and the macro $< is 
the name of the dependent that caused the action. 


The order of the suffix list is significant, since it is scanned from left to right, and the first 
name that is formed that has both a file and a rule associated with it is used. If new names are 
to be appended, the user can just add an entry for ‘S.SUFFIXES” in his own description file, 
the dependents will be added to the usual list. A ‘SSSUFFIXES” line without any dependents 
deletes the current list. (It is necessary to clear the current list if the order of names is to be 
changed). 


The following is an excerpt from the default rules file: 


SSUFFIXES : .o .c .e or .f .y .yr.ye .J.s 
Y ACC =yacc 
YACCR=yace —r 
YACCE=yacc —e 
YFLAGS= 
"LEX = lex 
LFLAGS= 
CC=cec 
AS as — 
CFLAGS= 
RC =ec 
RFLAGS= 
EC =ec 
EFLAGS= 
FFLAGS= 
.C.0: 
$(CC) $S(CFLAGS) —-c $< 
2.0 .£.0 .f.0 : 
S(EC) S(RFLAGS) $(EFLAGS) S(FFLAGS) —c $< 


8.0 : 
$(AS) —-0 $@ S< 

“Y.0 : 
$(YACC) SC(YFLAGS) $< 
$(CC) S(CFLAGS) —c y.tab.c 
rm y.tab.c 
mv y.tab.o $@ 

ve 


$(YACC) S(YFLAGS) $< 
mv y.tab.c $@ 


